从两个数据框中删除具有不同列值的行。

Question

从两个数据框中删除具有不同列值的行。

3

I have these two DFs

Active:

Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0
7           | 333       | 5.0
7           | 444       | 3.0

用户：

Customer_ID | product_No| Rating
9           | 111       | 2.0
9           | 222       | 5.0
9           | 666       | 5.0
9           | 555       | 3.0

我希望找到两个用户都评价过的常见产品的评分（例如111，222），并删除任何不常见的产品（例如444，333，555，666）。因此，新的数据框应该像这样：

活跃的：

Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0

用户：

Customer_ID | product_No| Rating
9           | 111       | 2.0
9           | 222       | 5.0

我不知道如何在没有循环的情况下完成这个任务。请帮帮我。

以下是我目前的代码：

import pandas as pd
ratings = pd.read_csv("ratings.csv",names['Customer_ID','product_No','Rating'])
active=ratings[ratings['UserID']==7]
user=ratings[ratings['UserID']==9]

- fsfr23

4个回答

1

使用query引用其他数据框。

Active.query('product_No in @User.product_No')

   Customer_ID  product_No  Rating
0            7         111     3.0
1            7         222     1.0

User.query('product_No in @Active.product_No')

   Customer_ID  product_No  Rating
0            9         111     2.0
1            9         222     5.0

- piRSquared

0

我尝试使用以下的INNER JOIN：

import pandas as pd

df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
print df1
print df2

df_ij = pd.merge(df1, df2, on='product_No', how='inner')
print df_ij

df_list = []
for df_e,suffx in zip([df1,df2],['_x','_y']):
    df_e = df_ij[['Customer_ID'+suffx,'product_No','Rating'+suffx]]
    df_e.columns = list(df1)
    df_list.append(df_e)

print df_list[0]
print df_list[1]

它会输出以下内容：

# print df1
   Customer_ID  product_No  Rating
0            7         111       3
1            7         222       1
2            7         333       5
3            7         444       3

# print df2
   Customer_ID  product_No  Rating
0            9         111       2
1            9         222       5
2            9         777       5
3            9         555       3

# print the INNER JOINed df
   Customer_ID_x  product_No  Rating_x  Customer_ID_y  Rating_y
0              7         111         3              9         2
1              7         222         1              9         5

# print the first df you want, with common 'product_No'
   Customer_ID  product_No  Rating
0            7         111       3
1            7         222       1

# print the second df you want, with common 'product_No'
   Customer_ID  product_No  Rating
0            9         111       2
1            9         222       5

inner join 选择每个 df 中的共同行。由于存在相同的列名，对于未在连接中使用的列，连接后的 df 添加了后缀以区分这些列名。然后，您只需要提取列即可获得所需的最终结果，只需指定适当的后缀。

这里有一个很好的 INNER JOIN 示例here。

- edesz

0

你的答案是....

import pandas as pd
dict1={"Customer_id":[7,7,7,7],
      "Product_No":[111,222,333,444],
      "rating":[3.0,1.0,5.0,3.0]}
active=pd.DataFrame(dict1)
dict2={"Customer_id":[9,9,9,9],
      "Product_No":[111,222,666,555],
      "rating":[2.0,5.0,5.0,3.0]}
user=pd.DataFrame(dict2)
df3=pd.merge(active,user,on="Product_No",how="inner")
df3
active=df3[["Customer_id_x","Product_No","rating_x"]]
print(active)
user=df3[["Customer_id_y","Product_No","rating_y"]]
print(user)

- Akshay Choulwar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Psidom · Accepted Answer

首先，您可以使用集合交集获取常见的product_No，然后使用isin方法在原始数据框上进行筛选：

common_product = set(active.product_No).intersection(user.product_No)

common_product
# {111, 222}

active[active.product_No.isin(common_product)]

#Customer_ID   product_No   Rating
#0         7          111      3.0
#1         7          222      1.0

user[user.product_No.isin(common_product)]

#Customer_ID   product_No   Rating
#0         9          111      2.0
#1         9          222      5.0