np.where 多个返回值

Question

np.where 多个返回值

10

使用pandas和numpy，我试图处理数据框中的一列，并希望创建一个与其相关的新列。因此，如果在列x中存在值1，在新列中它将是a，对于值2，它将是b等。

我可以针对单个条件执行此操作，例如：

df['new_col'] = np.where(df['col_1'] == 1, a, n/a)

我可以举出多条件的例子，例如如果 x = 3 或 x = 4，则值应该为 a，但不要像这样做：如果 x = 3，则值应该是 a，如果 x = 4，则值应该是 c。

我尝试了运行两行代码，例如：

df['new_col'] = np.where(df['col_1'] == 1, a, n/a)
df['new_col'] = np.where(df['col_1'] == 2, b, n/a)

但显然第二行会覆盖第一行。我是否遗漏了什么关键信息？

- DGraham

4个回答

3

我认为numpy的choose()是最适合您的选择。

import numpy as np
choices = 'abcde'
N = 10
np.random.seed(0)
data = np.random.randint(1, len(choices) + 1, size=N)
print(data)
print(np.choose(data - 1, choices))

输出：

[5 1 4 4 4 2 4 3 5 1]
['e' 'a' 'd' 'd' 'd' 'b' 'd' 'c' 'e' 'a']

- Stop harming Monica

1

您可以定义一个包含所需转换的字典。然后遍历DataFrame列并填充它。可能有更优雅的方法，但这个方法可行：

# create a dummy DataFrame
df = pd.DataFrame( np.random.randint(2, size=(6,4)), columns=['col_1', 'col_2', 'col_3', 'col_4'],  index=range(6)  )

# create a dict with your desired substitutions:
swap_dict = {  0 : 'a',
               1 : 'b',
             999 : 'zzz',  }

# introduce new column and fill with swapped information:
for i in df.index:
    df.loc[i, 'new_col'] = swap_dict[  df.loc[i, 'col_1']  ]

print df

返回类似于：

   col_1  col_2  col_3  col_4 new_col
0      1      1      1      1       b
1      1      1      1      1       b
2      0      1      1      0       a
3      0      1      0      0       a
4      0      0      1      1       a
5      0      0      1      0       a

- rde

1

使用pandas Series.map 代替 where。

import pandas as pd
df = pd.DataFrame({'col_1' : [1,2,4,2]})
print(df)

def ab_ify(v):
    if v == 1:
        return 'a'
    elif v == 2:
        return 'b'
    else:
        return None

df['new_col'] = df['col_1'].map(ab_ify)
print(df)

# output:
#
#    col_1
# 0      1
# 1      2
# 2      4
# 3      2
#    col_1 new_col
# 0      1       a
# 1      2       b
# 2      4    None
# 3      2       b

- SpeedCoder5

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

我认为你可以使用loc：

df.loc[(df['col_1'] == 1, 'new_col')] = a
df.loc[(df['col_1'] == 2, 'new_col')] = b

或者：

df['new_col'] = np.where(df['col_1'] == 1, a, np.where(df['col_1'] == 2, b, np.nan))

或者numpy.select：

df['new_col'] = np.select([df['col_1'] == 1, df['col_1'] == 2],[a, b], default=np.nan)

或者使用Series.map，如果没有匹配的，默认返回NaN：

d =  { 0 : 'a',  1 : 'b'}

df['new_col'] = df['col_1'].map(d)