我有一个数据框,其中一列包含了一些文本。我想在这一列的每一行中查找并获取两个字符串之间的子字符串。具体做法如下:
startinds = df[column].str.find("First Event = ")
endinds = df[column].str.find("\nLast Event = ")
df["first_timestamp"] = df[column].str.slice(startinds,endinds)
现在这个方式行不通,因为startinds
和endinds
都是序列,所以我不能使用它们作为索引来对column
中的字符串进行切片。
有人知道我可以访问值以对每一行进行子字符串操作的方法吗?
示例输入:
Data
0 "Blahblah
First Event = 09/20/2017 12:00:00
Last Event = 09/20/2017 13:00:00
Blahblahblah"
1 "Blahblahblahblah
Blahablahblah
First Event = 09/20/2017 12:30:00
Last Event = 09/20/2017 12:45:00
Blahblahblah"
输出:
first_timestamp
0 "First Event = 09/20/2017 12:00:00"
1 "First Event = 09/20/2017 12:30:00"
"First Event = " + df.Data.str.extract('(?<=First Event = )(.*)(?=\\\\nLast Event)', expand=False)
吗? - Zero