如何使用andrew_curves绘制pandas数据框?

3

I have the following pandas dataframe:

df = pd.read_csv('path/file/file.csv',
                 header=0, sep=',', names=['PhraseId', 'SentenceId', 'Phrase', 'Sentiment'])

我想使用andrew_curves打印它,我尝试了以下方法:

andrews_curves(df, 'Name')

有没有办法绘制它?这是csv文件的内容:
PhraseId, SentenceId, Phrase, Sentiment
1, 1, A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 1
2, 1, A series of escapades demonstrating the adage that what is good for the goose, 2
3, 1, A series, 2
4, 1, A, 2
5, 1, series, 2
6, 1, of escapades demonstrating the adage that what is good for the goose, 2
7, 1, of, 2
8, 1, escapades demonstrating the adage that what is good for the goose, 2
9, 1, escapades, 2
10, 1, demonstrating the adage that what is good for the goose, 2
11, 1, demonstrating the adage, 2
12, 1, demonstrating, 2
13, 1, the adage, 2
14, 1, the, 2
15, 1, adage, 2
16, 1, that what is good for the goose, 2
17, 1, that, 2
18, 1, what is good for the goose, 2
19, 1, what, 2
20, 1, is good for the goose, 2
21, 1, is, 2
22, 1, good for the goose, 3
23, 1, good, 3
24, 1, for the goose, 2
25, 1, for, 2
26, 1, the goose, 2
27, 1, goose, 2
28, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 2
29, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story, 2

你尝试的有什么问题吗?它报错了吗? - Uyghur Lives Matter
1个回答

2
在您提供的文档页面中,鸢尾花数据集有一个名为'Name'的列。当您调用该列时,
andrews_curves(data, 'Name')

data的行是按照Name的值进行分组的。这就是为什么在Iris数据集中,您会得到三种不同颜色的线条。

在您的数据集中,有三列:ABC。要在您的df上调用andrews_curves,您首先需要确定要按照哪个值进行分组。例如,如果它是C列的值,则调用:

andrews_curves(data, 'C')

如果您想按列名称ABC进行分组,则需要先将DataFrame从宽格式转换为长格式,然后在variable列上调用andrews_curves方法(该列保存每行的ABC值):
import numpy as np
import pandas as pd
import pandas.plotting as pdplt
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 1000)
df = pd.DataFrame({'A': np.sin(x**2)/x,
                   'B': np.sin(x)*np.exp(-x),
                   'C': np.cos(x)*x})
pdplt.andrews_curves(pd.melt(df), 'variable')
plt.show()

产出率

enter image description here


这太抽象了。我不知道“switching the key”是什么意思。请发布 text.csv 的示例以及引发 TypeError 的代码。 - unutbu
感谢您的帮助@unutbu。我更新了问题,我想绘制安德鲁斯曲线,给出一些小实例数据。 - skwoi
@skwoi:andrews_curves函数期望数据框的所有列都是数字。你可能需要将“Phrase”列转换为某种数字格式。我不确定你要如何做到这一点。另一个问题是你需要指定一个分组的列。在你的数据中没有明显的候选列。 - unutbu
谢谢你的回答。针对这种类型的数据,你还推荐哪种类型的图表呢? - skwoi

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接