在使用Python3和Google-Colab创建词云时，我应该如何使用Arial字体？

Question

在使用Python3和Google-Colab创建词云时，我应该如何使用Arial字体？

3

我有一个包含英文、阿拉伯文和波斯文的Twitter文本数据集。我想用它创建一个词云，但是我的词云图像中阿拉伯文和波斯文单词显示为空白方块。我听说可以有三种方法解决这个问题：

使用不同的编码：我尝试了"UTF-8"、"UTF-16"、"UTF-32"和"ISO-8859-1"，但没有解决问题
使用arabic_reshaper: 没有效果
使用同时支持三种语言的字体，比如"Arial"字体：在尝试将词云字体更改为Arial时，出现以下错误：

输入

wordcloud = WordCloud(font_path = 'arial',stopwords = stopwords, background_color = "white", max_font_size = 50, max_words = 100).generate(reshaped_text)

plt.imshow(wordcloud, interpolation='bilinear')

plt.axis("off")

plt.show()

输出

cannot open resource

这段代码在 Anaconda 中运行良好，但在 Google Colab 中无法运行。 需要解决的唯一问题是在 Google Colab 中应该输入什么路径作为 font_path。

- Vahid the Great

我觉得你没有正确地指定文件。应该是这样的：wordcloud = WordCloud(font_path='/Library/Fonts/Arial.ttf').generate(text) - undefined

是的，我猜我必须修复font_path，然而你提到的那个也返回了相同的错误。顺便说一下，我正在使用Google Colab平台。 - undefined

这段代码在Anaconda中运行良好，但在Google-Colab中却不行。唯一需要解决的问题是在Google-Colab中应该输入什么路径作为font_path。 - undefined

3个回答

2

我将字体上传到了我的谷歌云盘，并使用了以下代码，已经成功运行：

wordcloud = WordCloud(font_path='/content/drive/My Drive/ARIAL.TTF',stopwords=stopwords, background_color="white", max_font_size=50, max_words=100).generate(get_display(arabic_reshaper.reshape(all_tweets)))

- Vahid the Great

1

您可能想要测试这些波斯语特定的词云库。

也可以查看以下内容：

并且

- Mohammad Heydari

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Aliakbar Saleh · Accepted Answer

使用波斯语时，您需要解决三个问题：

波斯字符无法正确显示。这可以通过编码或字体来解决，我认为您已经解决了这个问题。
波斯字符出现但它们是分开的，此时您应该使用arabic_reshaper的reshape函数。请记住，这不会完全解决您的问题，您还需要进行第三步。
波斯语单词从左到右书写，您应该使用python-bidi库来解决这个问题。

例如，我使用以下代码成功创建了词云：

import matplotlib.pyplot as plt
import arabic_reshaper
from bidi.algorithm import get_display
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

txt = '''I would love to try or hear the sample audio your app can produce. I    do not want to purchase, because I've purchased so many apps that say they do something and do not deliver.  

Can you please add audio samples with text you've converted? I'd love to see the end results.

Thanks!

سلام حال سلام سلام سلام حال شما چطوره است نیست

'''

word_cloud = WordCloud(font_path='arial', stopwords=STOPWORDS, background_color="white", max_font_size=50, max_words=100)
word_cloud = word_cloud.generate_from_text(get_display(arabic_reshaper.reshape(txt)))

plt.imshow(word_cloud, interpolation='bilinear')
plt.axis("off")
plt.show()