如何在列表中找到单词开始和结束元素的索引?Python

3

我有一个字符串列表,需要查找其中是否包含 'American' 这个单词。如果存在,我想找到这个单词的起始和结束索引。

['Here in Americans, people say “Can I get a bag for the stuff?”',
 'Typically in restaurant after you are done with meal, you ask for check in Americans from the waiter.',
 'When mixing coffee, people in American use creamer, which is equivalent of milk.']

希望输出:查找“American”单词的起始和结束索引。
8,16
75,83
30,38
6个回答

5
您可以使用re.search,它返回一个带有start方法和end方法的匹配对象,这些方法可以返回您要查找的内容:
import re

l = [
    'Here in Americans, people say “Can I get a bag for the stuff?”',
    'Typically in restaurant after you are done with meal, you ask for check in Americans from the waiter.',
    'When mixing coffee, people in American use creamer, which is equivalent of milk.',
    'Hello World'
]

for string in l:
    match = re.search('American', string)
    if match:
        print('%d,%d' % (match.start(), match.end()))
    else:
        print('no match found')

这将输出:
8,16
75,83
30,38
no match found

2
你可以使用类似于str.find(search_item)的东西。这将返回搜索项出现的第一个索引值,然后你只需要返回index + len(search_item)即可。
就像这样:
string = "Hello world!"
search_item = "world"
search_index = string.find(search_item)
search_index_end = search_index+len(search_item)

print(string[search_index] : search_index_end])

输出:

world

search_index = 6
search_index_end = 11

1
我认为你应该查看str.find方法: https://docs.python.org/3/library/stdtypes.html#str.find 例子:
>>> str1 = 'Here in Americans, people say "Can I get a bag for the stuff?"'
>>> str2 = "Americans"
>>> print(str1.find(str2))
8

循环遍历列表以获取所需内容。

希望这对你有帮助。


因为如果句子中有重复的单词,它只会显示第一个,好吧。但使用find方法是我的答案。 - Bazinga
提问者想要单词的所有起始和结束索引。您的解决方案只给出了第一个起始索引。find方法总是返回第一个索引,因此不适合解决这个问题... - yuvgin

1

使用re和列表推导式。受@blhsing的解决方案启发。

import re
a=['Here in Americans, people say “Can I get a bag for the stuff?”',
 'Typically in restaurant after you are done with meal, you ask for check in Americans from the waiter.',
 'When mixing coffee, people in American use creamer, which is equivalent of milk.']

regex  = re.compile('American')

[(match.start(), match.end())  for i in a for match in regex.finditer(i)]

好的...如果句子中有两次出现单词,它只会显示第一个,但我需要美式单词“American”的起始和结束索引。默认情况下,“American consider football means”指的是美式足球。@mad_ - thrinadhn

1
string=['Here in Americans, people say “Can I get a bag for the stuff?”',
 'Typically in restaurant after you are done with meal, you ask for check in Americans from the waiter.',
 'When mixing coffee, people in American use creamer, which is equivalent of milk.']

string2="American"

for sentence in string:
    initial=int(sentence.find(string2))
    end_point=initial+len(string2)
    print ("%d,%d"%(initial,end_point))

0
这可能是另一种方法:
all_data = ['Here in Americans, people say “Can I get a bag for the stuff?”',
    'Typically in restaurant after you are done with meal, you ask for check in Americans from the waiter.',
    'When mixing coffee, people in American use creamer, which is equivalent of milk.']


for data in all_data:
    words = data.split(' ')
    counter = 0
    for position, word in enumerate(words):
        if 'American' in word:
            print('{}, {}'.format(counter, counter+8))
        else:
            counter += len(word) + 1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接