我有一个数据集,想要使用SelectKBest
和Chi2
获取特征的重要性,但是SelectKBest
给出的特征分数为nan
。
数据文件和代码文件位于此链接。
# Path to the data file
file_path = r"D:\Data_Sets\Mobile_Prices\data.csv"
# Reading the data from the Southern Second Order file, and also passing the column names to south_data data frame
south_data = pd.read_csv(file_path)
# Printing the number of data points and the number of columns of south_data data frame
print("The number of data points in the data :", south_data.shape[0])
print("The features of the data :", south_data.shape[1])
# Printing the head of south_data data frame
print(south_data.head())
# Check for the nulls
print(south_data.isnull().sum())
# Separate the x and y
x = south_data.drop("tss", axis = 1)
y = south_data["tss"]
# Find the scores of features
bestfit = SelectKBest(score_func=chi2, k=5)
features = bestfit.fit(x,y)
x_new = features.transform(x)
print(features.scores_)
# The output of features.scores_ is displayed as
# array([nan, nan, nan, nan, nan, nan, nan, nan, nan])