匹配子字符串的替代方案

3
有没有其他方法来实现这个解决方案? 当有许多关键字需要匹配时,使用str.contains()并不是很优雅。
df = DataFrame({'A':['Cat had a nap','Dog had puppies','Did you see a Donkey','kitten got angry','puppy was cute']})
    dic = {'Cat':'Cat','kitten':'Cat','Dog':'Dog','puppy':'Dog'}

               A
0   Cat had a nap
1   Dog had puppies
2   Did you see a Donkey
3   kitten got angry
4   puppy was cute


df['Cat'] = (df['A'].astype(str).str.contains('Cat')|df['A'].astype(str).str.contains('kitten')).replace({False:0, True:1})
df['Dog'] = (df['A'].astype(str).str.contains('Dog')|df['A'].astype(str).str.contains('puppy')).replace({False:0, True:1})
df



    A                    Cat    Dog
0   Cat had a nap          1    0
1   Dog had puppies        0    1
2   Did you see a Donkey   0    0
3   kitten got angry       1    0
4   puppy was cute         0    1
1个回答

3

使用 | 作为正则表达式中的 ,并通过 astype 将布尔值转换为整数,在 str.contains 中使用:

df['Cat'] = df['A'].astype(str).str.contains('Cat|kitten').astype(int)
df['Dog'] = df['A'].astype(str).str.contains('Dog|puppy').astype(int)

类似:

a = df['A'].astype(str)
df['Cat'] = a.str.contains('Cat|kitten').astype(int)
df['Dog'] = a.str.contains('Dog|puppy').astype(int)

print (df)
                      A  Cat  Dog
0         Cat had a nap    1    0
1       Dog had puppies    0    1
2  Did you see a Donkey    0    0
3      kitten got angry    1    0
4        puppy was cute    0    1

采用list字典的更加动态的解决方案:

dic = {'Cat':['Cat','kitten'],'Dog':['Dog','puppy']}
for k, v in dic.items():
    df[k] = df['A'].astype(str).str.contains('|'.join(v)).astype(int)
print (df)
                      A  Cat  Dog
0         Cat had a nap    1    0
1       Dog had puppies    0    1
2  Did you see a Donkey    0    0
3      kitten got angry    1    0
4        puppy was cute    0    1

除了str.contains之外,还有其他的方法吗?@jezrael - Sharvari Gc
@SharvariGc - 有什么问题吗?一个可能的解决方案是使用df ['A'] .apply(lambda x:'dog' in x) - jezrael
1
谢谢!为了更准确,请使用以下代码:df['A'].apply(lambda x: 'dog' in x.lower()) - Sharvari Gc
1
没问题。需要使用不同的语法来构建。@jezrael - Sharvari Gc

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接