我有一个列表的列表,看起来像这样,是从一个格式不良的csv文件中提取出来的:
DF = [['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
我希望最终能得到这样一个新的结构:
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address birthday social security number on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
之后我可以进一步拆分、剥离等操作。
所以,我利用以下事实:
- 客户号码始终以
Customer Number
开头 Notes
通常较长Notes
的数量从未超过5个
编写了显然是荒谬的解决方案,尽管它有效。
DF = [item for sublist in DF for item in sublist]
DF = DF + ['stophere']
DF2 = []
for record in DF:
if (record[0:17]=="Customer Number: ") & (record !="stophere"):
DF2.append(record + DF[DF.index(record)+1])
if len(DF[DF.index(record)+2]) >21:
DF2.append(record + DF[DF.index(record)+2])
if len(DF[DF.index(record)+3]) >21:
DF2.append(record + DF[DF.index(record)+3])
if len(DF[DF.index(record)+4]) >21:
DF2.append(record + DF[DF.index(record)+4])
if len(DF[DF.index(record)+5]) >21:
DF2.append(record + DF[DF.index(record)+5])
是否有人介绍一种更稳定、更智能的解决方案来应对这类问题?
Customer Number:...
,那么就需要采用不同的方法。如果客户在文本之后出现,会发生什么情况?例如[["foo"],["Customer Number: 100"]]
是您的列表的开头。 - Padraic Cunningham