替换NumPy数组中的空白

Question

替换NumPy数组中的空白

pythonarraysnumpy

4

我的 numpy 数组中的第三列是年龄。在这一列中，约75％的条目有效，25％为空白。第二列是性别，并使用一些操作计算出数据集中男性的平均年龄为30岁，女性的平均年龄为28岁。

我想将所有男性的空白年龄值替换为30岁，所有女性的空白年龄值替换为28岁。

但是我似乎做不到。有人有建议或知道我做错了什么吗？

以下是我的代码:

# my entire data set is stored in a numpy array defined as x

ismale = x[::,1]=='male'
maleAgeBlank = x[ismale][::,2]==''
x[ismale][maleAgeBlank][::,2] = 30

出于某种原因，当我完成上述代码后，我键入 x 来显示数据集，尽管我将空白部分设置为 30，但仍然存在空白。请注意，我不能使用 x[maleAgeBlank]，因为该列表将包括一些女性数据点，因为女性数据点尚未被排除。

有没有办法得到我想要的结果？出于某种原因，如果我执行 x[ismale][::,1] = 1（将“男性”列设置为 1），那就可以实现，但是 x[ismale][maleAgeBlank][::,2] = 30 就不行。

数组示例：

#output from typing x
array([['3', '1', '22', ..., '0', '7.25', '2'],
   ['1', '0', '38', ..., '0', '71.2833', '0'],
   ['3', '0', '26', ..., '0', '7.925', '2'],
   ..., 
   ['3', '0', '', ..., '2', '23.45', '2'],
   ['1', '1', '26', ..., '0', '30', '0'],
   ['3', '1', '32', ..., '0', '7.75', '1']], 
  dtype='<U82')

#output from typing x[0]

array(['3', '1', '22', '1', '0', '7.25', '2'], 
  dtype='<U82')

请注意，上述输出中我已将第二列更改为0表示女性，1表示男性。

- Terence Chow

你能发布一下数组的样例吗？ - user1301404

3个回答

2

您可以使用where函数：

arr = array([['3', '1', '22', '1', '0', '7.25', '2'], 
            ['3', '', '22', '1', '0', '7.25', '2']], 
           dtype='<U82')

blank = np.where(arr=='')

arr[blank] = 20

array([[u'3', u'1', u'22', u'1', u'0', u'7.25', u'2'],
       [u'3', u'20', u'22', u'1', u'0', u'7.25', u'2']], 
      dtype='<U82')

如果您想更改特定列，可以按照以下步骤操作：

male = np.where(arr[:, 1]=='') # where 1 is the column
arr[male] = 30

female = np.where(arr[:, 2]=='') # where 2 is the column
arr[female] = 28

- user1301404

"where" 是高效的，但当前的解决方案没有检查行的性别值，并更改了所有空白，而不仅仅是年龄列中的空白。 - ASGM

他不想将年龄的空值更改为平均值吗？年龄列仅包含男性和女性的1和2。因此，他需要两个“where”仅针对这两列。 - user1301404

0

你可以尝试以更简单的方式遍历数组。这不是最有效的解决方案，但应该能完成任务。

for row in range(len(x)):
    if row[2] == '':
        if row[1] == 1:
            row[2] == 30
        else:
            row[2] == 28

- ASGM

使用for循环迭代numpy数组是没有意义的。通过迭代，你失去了numpy的优势。 - user1301404

@void 没错，我并不是说没有更好的解决方案。但如果OP只关心快速解决这个特定任务，希望这可以帮到他。 - ASGM

使用where更高效。请检查我的答案。 - user1301404

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Akavall · Accepted Answer

这个怎么样：

my_data =  np.array([['3', '1', '22', '0', '7.25', '2'],
                     ['1', '0', '38', '0', '71.2833', '0'],
                     ['3', '0', '26', '0', '7.925', '2'],
                     ['3', '0', '', '2', '23.45', '2'],
                     ['1', '1', '26', '0', '30', '0'],
                     ['3', '1', '32', '0', '7.75', '1']], 
                     dtype='<U82')

ismale = my_data[:,1] == '0'
missing_age = my_data[:, 2] == ''
maleAgeBlank = missing_age & ismale
my_data[maleAgeBlank, 2] = '30'

结果：

>>> my_data
array([[u'3', u'1', u'22', u'0', u'7.25', u'2'],
       [u'1', u'0', u'38', u'0', u'71.2833', u'0'],
       [u'3', u'0', u'26', u'0', u'7.925', u'2'],
       [u'3', u'0', u'30', u'2', u'23.45', u'2'], 
       [u'1', u'1', u'26', u'0', u'30', u'0'],
       [u'3', u'1', u'32', u'0', u'7.75', u'1']], 
      dtype='<U82')