Python中itertools模块的对象组合

Question

Python中itertools模块的对象组合

pythonpython-3.xpandascombinationspython-itertools

4

Python itertools组合库是否可以用于对象而不是列表？

例如，如何在以下数据上使用它？

Rahul - 20,000 - Mumbai

Shivani - 30,000 - Mumbai

Akash - 40,000 - Bangalore

我希望得到姓名和薪资的所有可能组合。如何使用 combinations 实现此功能？假设数据已使用 pd.read_csv 读取并存储。

当前的代码 -

import pandas as pd
import itertools
df = pd.read_csv('stack.csv')

print (df)

for L in range(0, len(df)+1):
    for subset in itertools.combinations(df['Name'], L):
        print (subset)

输出

      Name  Salary       City
0    Rahul   20000     Mumbai
1  Shivani   30000     Mumbai
2    Akash   40000  Bangalore
()
('Rahul',)
('Shivani',)
('Akash',)
('Rahul', 'Shivani')
('Rahul', 'Akash')
('Shivani', 'Akash')
('Rahul', 'Shivani', 'Akash')

Process finished with exit code 0

如何在这些组合中添加薪资？

- user10141156

2

你能打印 df.head(5) 来了解输入格式吗？ - mad_

4

欢迎来到SO。请花些时间阅读[mcve]，[ask]和该页面上的其他链接。要求阅读并理解[mcve]，[ask]以及其他相关链接。 - wwii

1

现在有没有好一点？ - user10141156

你想要什么样的输出？你想要将薪资与姓名连接起来（例如，Rahul总是有20000），还是想要组合（姓名、薪资、姓名、薪资）（例如，有些项目中Rahul有20000、30000和40000）？ - sundance

请修正您的代码缩进。 - Bram Vanroy

最终期望的输出是，我有一个用户输入的“薪水”，我想找到最接近该用户薪水的薪水（组合或个人）。所以，如果Rahul总是有20000，那么这可能会更容易。 - user10141156

3个回答

1

您可以使用zip同时迭代两列，并使用列表推导式生成输出数据框，例如：

df_ouput = pd.DataFrame( [[', '.join(subset), sum(salaries)] for L in range(1, len(df)+1)
                           for subset, salaries in zip(itertools.combinations(df['Name'], L),
                                                       itertools.combinations(df['Salary'], L))], 
                         columns = ['Names','Sum Salaries'])

and you get:

                   Names  Sum Salaries
0                  Rahul         20000
1                Shivani         30000
2                  Akash         40000
3         Rahul, Shivani         50000
4           Rahul, Akash         60000
5         Shivani, Akash         70000
6  Rahul, Shivani, Akash         90000

- Ben.T

0

这样怎么样？

nameList = list()
sumList = list()
for L in range(0, len(df)+1):
    for x in itertools.combinations(df['Name'], L):
        nameList.append(x)
    for y in itertools.combinations(df['Salary'], L):
        sumList.append(sum(y))

newDf = pd.DataFrame()
newDf['Names'] = nameList
newDf['Salary Sum'] = sumList

输出：

                     Names  Salary Sum
0                       ()           0
1                 (Rahul,)       20000
2               (Shivani,)       30000
3                 (Akash,)       40000
4         (Rahul, Shivani)       50000
5           (Rahul, Akash)       60000
6         (Shivani, Akash)       70000
7  (Rahul, Shivani, Akash)       90000

- Ankur Sinha

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user3483203 · Accepted Answer

首先，获取您的索引：

idx = [j for i in range(1, len(df) + 1) for j in list(itertools.combinations(df.index, i))]
# [(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]

获取每个组的数据框：

dfs = [df.iloc[list(i)] for i in idx]

最后，进行连接和求和：

out = [(', '.join(i.name.values), sum(i.salary.values)) for i in dfs]

输出：

[('Rahul', 20000),
 ('Shivani', 30000),
 ('Akash', 40000),
 ('Rahul, Shivani', 50000),
 ('Rahul, Akash', 60000),
 ('Shivani, Akash', 70000),
 ('Rahul, Shivani, Akash', 90000)]

如果您想将此内容转换为数据框，非常简单：

df1 = pd.DataFrame(out, columns=['names', 'salaries'])

                   names  salaries
0                  Rahul     20000
1                Shivani     30000
2                  Akash     40000
3         Rahul, Shivani     50000
4           Rahul, Akash     60000
5         Shivani, Akash     70000
6  Rahul, Shivani, Akash     90000

要查询这个数据框中最接近给定工资的数值，我们可以编写一个辅助函数：

def return_closest(val):
    return df1.iloc[(df1.salaries - val).abs().idxmin()]


>>> return_closest(55000)
names       Rahul, Shivani
salaries             50000
Name: 3, dtype: object

我将其分解成几个步骤，以便您了解每个步骤正在发生的情况。一旦您理解了这些步骤，您可以将其合并为一行，以创建您的数据框：

pd.DataFrame(
    [(', '.join(d.name.values), sum(d.salary.values))
    for i in [j for i in range(1, len(df) + 1)
    for j in list(itertools.combinations(df.index, i))]
    for d in [df.iloc[list(i)]]], columns=['names', 'salaries']
)