I have a pandas dataframe look like this:
ID Col.A
28654 This is a dark chocolate which is sweet
39876 Sky is blue 1234 Sky is cloudy 3423
88776 Stars can be seen in the dark sky
35491 Schools are closed 4568 but shops are open
我试图在单词dark
或digits
之前分割Col.A
。 我期望的结果如下。
ID Col.A Col.B
28654 This is a dark chocolate which is sweet
39876 Sky is blue 1234 Sky is cloudy 3423
88776 Stars can be seen in the dark sky
35491 Schools are closed 4568 but shops are open
我试图将包含单词
dark
的行分组到一个数据框中,将带有数字的行分组到另一个数据框中,然后相应地拆分它们。之后我可以连接得到的数据框以获得预期的结果。代码如下所示:df = pd.DataFrame({'ID':[28654,39876,88776,35491], 'Col.A':['This is a dark chocolate which is sweet',
'Sky is blue 1234 Sky is cloudy 3423',
'Stars can be seen in the dark sky',
'Schools are closed 4568 but shops are open']})
df1 = df[df['Col.A'].str.contains(' dark ')==True]
df2 = df.merge(df1,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
df1 = df1["Col.A"].str.split(' dark ', expand = True)
df2 = df2["Col.A"].str.split('\d+', expand = True)
pd.concat([[df1, df2], axis =0)
得到的结果与预期不符,即:
0 1
0 This is a chocolate which is sweet
2 Stars can be seen in the sky
1 Sky is blue Sky is cloudy
3 Schools are closed but shops are open
我错过了字符串中的数字和结果中的单词dark
。
那么如何解决这个问题,不错过分割单词和数字就获取结果呢?
有没有一种方法可以"在期望的单词或数字之前切片"而不移除它们?