目前,我已从Excel导入以下数据帧到pandas中,并且想要根据两列的值删除重复的值。
# Python 3.5.2
# Pandas library version 0.22
import pandas as pd
# Save the Excel workbook in a variable
current_workbook = pd.ExcelFile('C:\\Users\\userX\\Desktop\\cost_values.xlsx')
# convert the workbook to a data frame
current_worksheet = pd.read_excel(current_workbook, index_col = 'vend_num')
# current output
print(current_worksheet)
| vend_number | vend_name | quantity | source |
| ----------- |----------------------- | -------- | -------- |
CHARLS Charlie & Associates $5,700.00 Central
CHARLS Charlie & Associates $5,700.00 South
CHARLS Charlie & Associates $5,700.00 North
CHARLS Charlie & Associates $5,700.00 West
HUGHES Hughinos $3,800.00 Central
HUGHES Hughinos $3,800.00 South
FERNAS Fernanda Industries $3,500.00 South
FERNAS Fernanda Industries $3,500.00 North
FERNAS Fernanda Industries $3,000.00 West
....
我想要的是基于数量和来源列删除重复值:
检查数量和来源列的值:
1.1 如果供应商的数量等于同一供应商的另一行,且来源不等于Central,则删除来自此供应商的重复行,除了Central行。
1.2 否则,如果供应商的数量在同一供应商的另一行中相等,并且没有Central来源,则删除重复行。
期望的结果
| vend_number | vend_name | quantity | source |
| ----------- |----------------------- | -------- | -------- |
CHARLS Charlie & Associates $5,700.00 Central
HUGHES Hughinos $3,800.00 Central
FERNAS Fernanda Industries $3,500.00 South
FERNAS Fernanda Industries $3,000.00 West
....
到目前为止,我已经尝试了以下代码,但是 Pandas 没有检测到任何重复行。
print(current_worksheet.loc[current_worksheet.duplicated()])
print(current_worksheet.duplicated())
我尝试解决这个问题,但在这个问题上遇到了一些困难,因此非常感谢对这个问题提供的任何帮助。请随意改进这个问题。