使用Python查找文本中的表情符号

Question

使用Python查找文本中的表情符号

5

你好，我正在尝试使用Python 2.7在下载的推文中查找所有表情符号。

我已经尝试使用以下代码：

import os
import codecs
import emoji
from nltk.tokenize import word_tokenize

def extract_emojis(token):
    emoji_list = []
    if token in emoji.UNICODE_EMOJI:
        emoji_list.append(token)
    return emoji_list

for tweet in os.listdir(tweets_path):
    with codecs.open(tweets_path+tweet, 'r', encoding='utf-8') as input_file:
        line = input_file.readline()
        while line:
            line = word_tokenize(line)
            for token in line:
                print extract_emojis(token)

            line = input_file.readline()

然而，我只得到了空列表，而不是表情符号。如果我得到下面的推文：

schuld van de sossen  SP.a: wij hebben niks gedaan  Groen: we gaan energie VERBIEDEN!

代码的输出结果是:

[]

期望输出的结果没有出现:

[, ]

需要帮忙吗？谢谢！

- m4sh4

3个回答

1

这是关于Python 2的内容 -

x = "schuld van de sossen  SP.a: wij hebben niks gedaan  Groen: we gaan energie VERBIEDEN!"
[i for i in x.split() if unicode(i, "utf-8") in emoji.UNICODE_EMOJI]

# OP
['\xf0\x9f\x98\xa1', '\xf0\x9f\x98\xb4']

- Sushant

0

有许多方法可以在Python中从字符串中提取表情符号。

其中一种显著的方法是使用emoji库。

如果您正在处理文件，请确保使用编码utf-8读取文件（同时保存utf-8-sig）

这里将展示如何列出字符串中存在的所有表情符号，以及字符串中表情符号的数量和每个表情符号的类型

代码：

#import required libraries
import emoji
from emoji import UNICODE_EMOJI

#getting all emojis as lists
all_emojis = list(UNICODE_EMOJI.keys())

#defining sentence
sentence = "schuld van de sossen  SP.a: wij hebben niks gedaan  Groen: we gaan energie VERBIEDEN!"

#getting Emoji Count
emoji_count = sum([sentence.count(emoj) for emoj in UNICODE_EMOJI])
#listing all Emojis
listed_emojis = ','.join(re.findall(f"[{''.join(all_emojis)}]", str(sentence)))
#listing all Emoji Types
emoji_types = ','.join([UNICODE_EMOJI[detect_emoji].upper()[1:-1] for detect_emoji in listed_emojis.split(',')])

#Displaying Sentence, Emoji Count, Emojis and Emoji Types
print(f"Sentence: {sentence}\nListed Emojis: {listed_emojis}\nCount: {emoji_count}\nEmoji Types: {emoji_types}")

输出：

Sentence: schuld van de sossen  SP.a: wij hebben niks gedaan  Groen: we gaan energie VERBIEDEN!
Listed Emojis: ,
Count: 2
Emoji Types: POUTING_FACE,SLEEPING_FACE

希望这对你有所帮助。如果有任何问题，请在此处写下。我会尽力解决... :)

- Littin Rajan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Irfanuddin · Accepted Answer

确保您的文本已解码为utf-8 text.decode('utf-8')

查找文本中的所有表情符号，您必须逐个字符分隔文本 [str for str in decode]

将所有表情符号保存在列表中 [c for c in allchars if c in emoji.UNICODE_EMOJI]

类似这样：

import emoji
text     = "  lorum ipsum  de "
decode   = text.decode('utf-8')
allchars = [str for str in decode]
list     = [c for c in allchars if c in emoji.UNICODE_EMOJI]
print list

[u'\u62b1\u6b49', u'\u5fc3\u6697\u4e86', u'\u7231\u4f60', u'\u7231\u5fc3', u'\u4eba\u5bb6', u'\u732a']

要恢复您的表情符号，请尝试this