在循环中按索引遍历列表的列表,以重新格式化字符串

16

我有一个列表的列表,看起来像这样,是从一个格式不良的csv文件中提取出来的:

DF = [['Customer Number: 001 '],
 ['Notes: Bought a ton of stuff and was easy to deal with'],
 ['Customer Number: 666 '],
 ['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
 ['Customer Number: 103 '],
 ['Notes: bought a ton of stuff got a free keychain'],
 ['Notes: gave us a referral to his uncles cousins hairdresser'],
 ['Notes: name address birthday social security number on file'],
 ['Customer Number: 007 '],
 ['Notes: looked a lot like James Bond'],
 ['Notes: came in with a martini']]

我希望最终能得到这样一个新的结构:

['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

之后我可以进一步拆分、剥离等操作。

所以,我利用以下事实:

  • 客户号码始终以 Customer Number 开头
  • Notes 通常较长
  • Notes 的数量从未超过5个

编写了显然是荒谬的解决方案,尽管它有效。

DF = [item for sublist in DF for item in sublist]
DF = DF + ['stophere']
DF2 = []

for record in DF:
    if (record[0:17]=="Customer Number: ") & (record !="stophere"):
        DF2.append(record + DF[DF.index(record)+1])
        if len(DF[DF.index(record)+2]) >21:
            DF2.append(record + DF[DF.index(record)+2])
            if len(DF[DF.index(record)+3]) >21:
                DF2.append(record + DF[DF.index(record)+3])
                if len(DF[DF.index(record)+4]) >21:
                    DF2.append(record + DF[DF.index(record)+4])
                    if len(DF[DF.index(record)+5]) >21:
                        DF2.append(record + DF[DF.index(record)+5])

是否有人介绍一种更稳定、更智能的解决方案来应对这类问题?

6个回答

13

只需跟踪我们何时找到新客户:

from pprint import pprint as pp

out = []
for sub in DF:
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        out.append(cust + sub[0])
pp(out)

输出:

['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
 'hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
 'hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number '
 'on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

如果客户稍后可以重复,并且您希望将它们分组在一起,请使用字典(dict):
from collections import defaultdict
d = defaultdict(list)
for sub in DF:
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        d[cust].append(cust + sub[0])
print(d)

输出:

pp(d)

{'Customer Number: 001 ': ['Customer Number: 001 Notes: Bought a ton of '
                           'stuff and was easy to deal with'],
 'Customer Number: 007 ': ['Customer Number: 007 Notes: looked a lot like '
                           'James Bond',
                           'Customer Number: 007 Notes: came in with a '
                           'martini'],
 'Customer Number: 103 ': ['Customer Number: 103 Notes: bought a ton of '
                           'stuff got a free keychain',
                           'Customer Number: 103 Notes: gave us a referral '
                           'to his uncles cousins hairdresser',
                           'Customer Number: 103 Notes: name address '
                           'birthday social security number on file'],
 'Customer Number: 666 ': ['Customer Number: 666 Notes: acted and looked '
                           'like Chris Farley on that hidden decaf skit '
                           'from SNL']}

根据您的评论和错误提示,看起来您的文本中存在在实际客户之前出现的行,因此我们可以将它们添加到列表中的第一个客户中:

# added ["foo"] before we see any customer

DF = [["foo"],['Customer Number: 001 '],
 ['Notes: Bought a ton of stuff and was easy to deal with'],
 ['Customer Number: 666 '],
 ['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
 ['Customer Number: 103 '],
 ['Notes: bought a ton of stuff got a free keychain'],
 ['Notes: gave us a referral to his uncles cousins hairdresser'],
 ['Notes: name address birthday social security number on file'],
 ['Customer Number: 007 '],
 ['Notes: looked a lot like James Bond'],
 ['Notes: came in with a martini']]


from pprint import pprint as pp

from itertools import takewhile, islice

# find lines up to first customer
start = list(takewhile(lambda x: "Customer Number:" not in x[0], DF))

out = []
ln = len(start)
# if we had data before we actually found a customer this will be True
if start: 
    # so set cust to first customer in list and start adding to out
    cust = DF[ln][0]
    for sub in start:
        out.append(cust + sub[0])
# ln will either be 0 if start is empty else we start at first customer
for sub in islice(DF, ln, None):
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        out.append(cust + sub[0])

这将输出:

 ['Customer Number: 001 foo',
 'Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
 'hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
 'hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number '
 'on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

我觉得您会认为在第一个客户之前的行实际上属于第一个客户。


@MattO'Brien,如果您的列表的第一个元素实际上不是Customer Number:...,那么就需要采用不同的方法。如果客户在文本之后出现,会发生什么情况?例如[["foo"],["Customer Number: 100"]]是您的列表的开头。 - Padraic Cunningham
啊...对不起,我犯了一个小错误。我将删除我的评论。谢谢! - tumultous_rooster
@MattO'Brien,不用担心,我仍然会保留备选代码,因为它可能对某些人有用。 - Padraic Cunningham

4
你的基本目标是将笔记分组并与客户关联。由于列表已经排序,因此可以直接使用 itertools.groupby,如下所示。
from itertools import groupby, chain

def build_notes(it):
    customer, func = "", lambda x: x.startswith('Customer')
    for item, grp in groupby(chain.from_iterable(DF), key=func):
        if item:
            customer = next(grp)
        else:
            for note in grp:
                yield customer + note
            # In Python 3.x, you can simply do
            # yield from (customer + note for note in grp)

在这里,我们使用chain.from_iterable将实际的列表列表展平为一系列字符串。然后,我们将包含Customer和不包含Customer的行分组。如果该行包含Customer,则item将为True,否则为False。如果itemTrue,则获取客户信息;当itemFalse时,迭代分组注释并通过连接客户信息和注释逐个返回一个字符串。

因此,当您运行代码时,

print(list(build_notes(DF)))

你会得到:

['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

3
DF = [['Customer Number: 001 '],
 ['Notes: Bought a ton of stuff and was easy to deal with'],
 ['Customer Number: 666 '],
 ['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
 ['Customer Number: 103 '],
 ['Notes: bought a ton of stuff got a free keychain'],
 ['Notes: gave us a referral to his uncles cousins hairdresser'],
 ['Notes: name address birthday social security number on file'],
 ['Customer Number: 007 '],
 ['Notes: looked a lot like James Bond'],
 ['Notes: came in with a martini']]

custnumstr = None
out = []
for df in DF:
     if df[0].startswith('Customer Number'):
         custnumstr = df[0]
     else:
         out.append(custnumstr + df[0])

for e in out:
    print e

3
您也可以使用OrderedDict,其中键是客户,值是注释列表:
from collections import OrderedDict

DF_dict = OrderedDict()

for subl in DF:
    if 'Customer Number' in subl[0]:  
        DF_dict[subl[0]] = []
        continue    
    last_key = list(DF_dict.keys())[-1]
    DF_dict[last_key].append(subl[0])


for customer, notes in  DF_dict.items():
    for a_note in notes:
        print(customer,a_note)

结果如下:

Customer Number: 001  Notes: Bought a ton of stuff and was easy to deal with
Customer Number: 666  Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL
Customer Number: 103  Notes: bought a ton of stuff got a free keychain
Customer Number: 103  Notes: gave us a referral to his uncles cousins hairdresser
Customer Number: 103  Notes: name address birthday social security number on file
Customer Number: 007  Notes: looked a lot like James Bond
Customer Number: 007  Notes: came in with a martini

像这样将值放入字典中,如果您想计算给定客户的笔记数量、计数或仅选择给定客户的笔记,则可能很有用。

另一种方法是,在每次迭代时不调用list(DF_dict.keys())[-1]

last_key = ''

for subl in DF:
    if 'Customer Number' in subl[0]:  
        DF_dict[subl[0]] = []
        last_key = subl[0]
        continue    

    DF_dict[last_key].append(subl[0])

使用defaultdict的新的更短版本:

from collections import defaultdict

DF_dict = defaultdict(list)

for subl in DF:
    if 'Customer Number' in subl[0]:         
        customer = subl[0]
        continue        

    DF_dict[customer].append(subl[0])

你也可以使用 k = subl[0] DF_dict.setdefault(k,[]),然后忘记 continue - Padraic Cunningham

2
只要格式与您的示例相同,这应该能正常工作。
final_list = []
for outer_list in DF:
    for s in outer_list:
        if s.startswith("Customer"):
            cust = s
        elif s.startswith("Notes"):
            final_list.append(cust + s)

for f in final_list:
    print f

2
只要第一个元素是客户,您就可以这样做。简单地遍历每个项目。如果该项目是客户,则将当前客户设置为该字符串。否则,它是一条注释,因此您将客户和注释附加到结果列表中。
customer = ""
results = []
for record in DF:
    data = record[0]
    if "Customer" in data:
        customer = data
    elif "Notes" in data:
        result = customer + data
        results.append(result)

print(results)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接