Pandas DataFrame中的Z-score标准化（使用Python）

Question

Pandas DataFrame中的Z-score标准化（使用Python）

7

我正在使用Python3（Spyder）编程，我有一个表格，它的对象类型为“pandas.core.frame.DataFrame”。我想对该表中的值进行z-score归一化处理（每个值减去其行的均值并除以其行的标准差），使得每行的平均值为0，标准差为1。我尝试了两种方法。

第一种方法

from scipy.stats import zscore
zetascore_table=zscore(table,axis=1)

第二种方法

rows=table.index.values
columns=table.columns
import numpy as np
for i in range(len(rows)):
    for j in range(len(columns)):
         table.loc[rows[i],columns[j]]=(table.loc[rows[i],columns[j]] - np.mean(table.loc[rows[i],]))/np.std(table.loc[rows[i],])
table

两种方法看起来都可以，但是当我检查每行的均值和标准差时，它们不是0和1，而是其他浮点值。我不知道可能出现了什么问题。

非常感谢您的帮助！

- pablo11prade

或许值得注意的是，(a) df['z score'] = zscore(df['col A']) 和 (b) df['z score'] = (df['col A']-df['col A'].mean())/df['col A'].std() 并不会给出完全相同的z分数。 (a) 使用零自由度，而 (b) 默认使用1个自由度来计算标准差。根据应用的不同，您可以设置ddof相等--例如在 (b) 中使用 df['col A'].std(ddof=0) 将使它们相等（zscore() 的默认值为ddof=0）。请参阅 https://dev59.com/jrjoa4cB1Zd3GeqPAIql 了解有关ddof的更多信息。 - J Prestone

2个回答

1

抱歉，经过思考，我发现另一种更简单的方法来计算z-score（将每行的平均值减去并将结果除以该行的标准差）而不是使用for循环。

table=table.T# need to transpose it since the functions work like that 
sd=np.std(table)
mean=np.mean(table)
numerator=table-mean #numerator in the formula for z-score 
z_score=numerator/sd
z_norm_table=z_score.T #we transpose again and we have the initial table but with all the 
#values z-scored by row.

我检查了一下，现在每行的平均值都是0或非常接近于0，标准差为1或非常接近于1，所以这对我来说是有效的。抱歉，我对编程经验很少，有时候简单的事情需要很多尝试，直到我找到解决方法。

- pablo11prade

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BGG16 · Accepted Answer

下面的代码计算了Pandas df列中每个值的z得分。然后将z得分保存在新列（此处称为“num_1_zscore”）中。非常容易做到。

from scipy.stats import zscore
import pandas as pd

# Create a sample df
df = pd.DataFrame({'num_1': [1,2,3,4,5,6,7,8,9,3,4,6,5,7,3,2,9]})

# Calculate the zscores and drop zscores into new column
df['num_1_zscore'] = zscore(df['num_1'])

display(df)