Python：高效检查一个列表中的值是否在另一个列表中

Question

Python：高效检查一个列表中的值是否在另一个列表中

3

我是一名有用的助手，会将文本翻译为中文。

我有一个数据框 user_df，包含大约500,000行数据，格式如下：

|  id  |  other_ids   |
|------|--------------|
|  1   |['abc', efg'] |
|  2   |['bbb']       |
|  3   |['ccc', 'ddd']|

我还有一个列表，名称为other_ids_that_clicked，其中包含约5,000个其他id的完整列表：

 ['abc', 'efg', 'ccc']

我希望通过在df中添加另一列来使用user_df来去重其他点击的id，当other_ids中的值在user_df ['other_ids']中时：

|  id  |  other_ids   |  clicked  |
|------|--------------|-----------|
|  1   |['abc', efg'] |     1     |
|  2   |['bbb']       |     0     |
|  3   |['ccc', 'ddd']|     1     |

我正在检查的方法是对于user_df中的每一行，循环遍历other_ids_that_clicked。

def otheridInList(row):
  isin = False
  for other_id in other_ids_that_clicked:
    if other_id in row['other_ids']:
        isin = True
        break
    else: 
        isin = False
  if isin:
    return 1
  else:
    return 0

这个太费时间了，我正在寻找最佳方法的建议。

谢谢！

- user8766186

2个回答

3

使用set

df['New']=(df.other_ids.apply(set)!=(df.other_ids.apply(set)-set(l))).astype(int)
df
Out[114]: 
   id   other_ids  New
0   1  [abc, efg]    1
1   2       [bbb]    0
2   3  [ccc, ddd]    1

- BENY

很棒...使用集合操作。 - cs95

1

@cᴏʟᴅsᴘᴇᴇᴅ pd.DataFrame(df.other_ids.tolist()) 让我大吃一惊！学到了：-） - BENY

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cs95 · Accepted Answer

您可以实际上加快这个过程。将该列取出，转换为自己的数据框，并使用df.isin进行一些检查 -

l = ['abc', 'efg', 'ccc']
df['clicked'] = pd.DataFrame(df.other_ids.tolist()).isin(l).any(1).astype(int)

   id   other_ids  clicked
0   1  [abc, efg]        1
1   2       [bbb]        0
2   3  [ccc, ddd]        1

细节

首先，将other_ids转换为列表的列表 -

i = df.other_ids.tolist()

i
[['abc', 'efg'], ['bbb'], ['ccc', 'ddd']]

现在，将它加载到一个新的数据框中 -

j = pd.DataFrame(i)

j
     0     1
0  abc   efg
1  bbb  None
2  ccc   ddd

使用isin进行检查 -

k = j.isin(l)

k
       0      1
0   True   True
1  False  False
2   True  False

clicked 可以通过使用 df.any 检查是否有 True 存在于任何行中进行计算。结果将转换为整数。

k.any(1).astype(int)

0    1
1    0
2    1
dtype: int64