我已创建了一个Pandas数据框,并可以确定该数据框中一个或多个列(按列级别)的标准差。我需要确定特定列所有行的标准差。以下是我到目前为止尝试过的命令:
# Will determine the standard deviation of all the numerical columns by default.
inp_df.std()
salary 8.194421e-01
num_months 3.690081e+05
no_of_hours 2.518869e+02
# Same as above command. Performs the standard deviation at the column level.
inp_df.std(axis = 0)
# Determines the standard deviation over only the salary column of the dataframe.
inp_df[['salary']].std()
salary 8.194421e-01
# Determines Standard Deviation for every row present in the dataframe. But it
# does this for the entire row and it will output values in a single column.
# One std value for each row.
inp_df.std(axis=1)
0 4.374107e+12
1 4.377543e+12
2 4.374026e+12
3 4.374046e+12
4 4.374112e+12
5 4.373926e+12
当我执行下面的命令时,所有记录都会显示“NaN”。有没有办法解决这个问题?
# Trying to determine standard deviation only for the "salary" column at the
# row level.
inp_df[['salary']].std(axis = 1)
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
N-1
,其中N
是1
。 - filippoNaN
还是你没有注意到你正在对单个样本计算标准偏差。很高兴现在问题解决了! - filippo