从numpy数组中删除行不起作用

Question

从numpy数组中删除行不起作用

3

我将尝试将我的数据点的numpy数组分成测试集和训练集。为了实现这一目标，我从数组中随机选择行作为训练集，其余部分是测试集。

以下是我的代码：

matrix = numpy.loadtxt("matrix_vals.data", delimiter=',', dtype=float)
matrix_rows, matrix_cols = matrix.shape

# training set 
randvals = numpy.random.randint(matrix_rows, size=50)
train = matrix[randvals,:]
test = numpy.delete(matrix, randvals, 0)

print matrix.shape
print train.shape
print test.shape

但是我得到的输出是：

matrix.shape: (130, 14)
train.shape: (50, 14)
test.shape: (89, 14)

这显然是错误的，因为训练集和测试集的行数应该加起来等于矩阵中的总行数，但这里明显更多。有人能帮我弄清楚出了什么问题吗？

- SanjanaS801

2个回答

3

为什么不使用scikit-learn的train_test_split函数，避免所有麻烦呢？

import numpy as np
from sklearn.cross_validation import train_test_split

train, test = train_test_split(mat, test_size = 50.0/130.0)

- Charlie Haley

这会给我一个随机分割吗？编辑：刚刚检查了文档，它可以！感谢提供替代方案！我不知道这个函数。但我仍然想知道为什么我的代码不起作用。 - SanjanaS801

是的。如果您愿意，您可以自行测试它。请查看我上面回答中的链接文档，了解它的工作原理和参数信息。 - Charlie Haley

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ali_m · Accepted Answer

因为你正在生成带有替换的随机整数，所以randvals几乎肯定会包含重复的索引。

使用重复的索引进行索引会多次返回相同的行，因此无论是否重复，matrix[randvals,:]都保证给出具有确切50个行的输出。

相比之下，np.delete(matrix, randvals, 0)只会删除唯一的行索引，因此它仅通过randvals中唯一值的数量减少行的数量。

尝试进行比较：

print(np.unique(randvals).shape[0] == matrix_rows - test.shape[0])
# True

为了生成一个在0和1 - matrix_rows之间的唯一随机索引向量，您可以使用np.random.choice和replace=False。

uidx = np.random.choice(matrix_rows, size=50, replace=False)

那么 matrix[uidx].shape[0] + np.delete(matrix, uidx, 0).shape[0] == matrix_rows。