我尝试在Python中使用imblearn包中的SMOTE,但我的数据有很多缺失值,导致出现以下错误:
是否有一种方法可以生成带有缺失值的合成样本呢?
我查看了这里的参数,似乎没有处理缺失值的参数。ValueError: 输入包含 NaN、无穷大或太大的值 dtype('float64')。
是否有一种方法可以生成带有缺失值的合成样本呢?
我查看了这里的参数,似乎没有处理缺失值的参数。ValueError: 输入包含 NaN、无穷大或太大的值 dtype('float64')。
SMOTE
。# Imports
from collections import Counter
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import Imputer
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import make_pipeline
# Load data
bc = load_breast_cancer()
X, y = bc.data, bc.target
# Initial number of samples per class
print('Number of samples for both classes: {} and {}.'.format(*Counter(y).values()))
# SMOTEd class distribution
print('Dataset has %s missing values.' % np.isnan(X).sum())
_, y_resampled = SMOTE().fit_sample(X, y)
print('Number of samples for both classes: {} and {}.'.format(*Counter(y_resampled).values()))
# Generate artificial missing values
X[X > 1.0] = np.nan
print('Dataset has %s missing values.' % np.isnan(X).sum())
_, y_resampled = make_pipeline(Imputer(), SMOTE()).fit_sample(X, y)
print('Number of samples for both classes: {} and {}.'.format(*Counter(y_resampled).values()))