如何使用分类和数值变量绘制相关矩阵/热力图

Question

如何使用分类和数值变量绘制相关矩阵/热力图

3

我有4个变量，其中2个是名义型变量（dtype=object），另外2个是数字型变量（dtypes=int和float）。

df.head(1)

OUT:
OS_type|Week_day|clicks|avg_app_speed
iOS|Monday|400|3.4

现在，我想将数据框转换成 seaborn 热力图可视化。

import numpy as np
import seaborn as sns
ax = sns.heatmap(df)

但是我遇到了一个错误，指出我只能使用数字而不能使用分类变量。我应该如何正确地处理它，然后将其反馈到热力图中?

- LaLaTi

1

你可以尝试将分类列定义为二进制数据，然后应用相关矩阵。相关主题 - Alexandre B.

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Taq Seorangpun · Answer 1

要绘制的热图需要在0到1之间的值。对于数值变量之间的相关性，可以使用Pearson's R；对于分类变量 (经过修正的) Cramer's V；对于分类和数值变量之间的相关性，可以使用相关比率。

至于创建分类变量的数字表示，有许多方法可供选择：

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.read_csv('some_source.csv')  # has categorical var 'categ_var'

# method 1: uses pandas
df['numerized1'] = df['categ_var'].astype('category').cat.codes

# method 2: uses pandas, sorts values descending by frequency
df['numerized2'] = df['categ_var'].apply(lambda x: df['categ_var'].value_counts().index.get_loc(x))

# method 3: uses sklearn, result is the same as method 1
lbl = LabelEncoder()
df['numerized3'] = lbl.fit_transform(df['categ_var'])

# method 4: uses pandas; xyz captures a list of the unique values 
df['numerized4'], xyz = pd.factorize(df['categ_var'])