从列表的列表中提取数字

3

我有一份包含多个列表的列表:

my_list=[['word:', 'house', 'garden', '0,2%'],
 ['word:', 'house', 'garden', '0,2%'],
 ['house', 'garden', '0,2%'],
 ['house', 'garden', '0,2%'],
 ['garden', '0,2%', '0,125%'],
 ['house', '0,2%', '?????'],
 ['house', 'garden', '0,02%'],
 ['house', 'garden', '0,02%'],
 ['garden', '0,02%'],
 ['house', 'garden', '0,2%'],
 ['garden', '0,2%'],
 ['house', '0,2'],
 ['house', '0,2', '%'],
 ['house', 'garden', 'kids', '0,2%'],
 ['house', 'garden', 'kids', '0,2%'],
 ['house', '0,2%', 'boy'],
 ['house', '0,12%'],
 ['house', '4%.'],
 ['house', '4%.', '4.'],
 ['house', '0,2%”.']]

我需要根据“房子”和“花园”这些词提取数字,以便得到如下结果:
{'garden': ['0,2', '0,2', '0,2', '0,2', '0,2', '0.125', '0.02', '0.02', '0.02', '0.2', '0.2', '0,2'], 'house': ['0.2', '0.2', '0.2', '0.2', '0,2', '0,02', '0,02', '0,2', '0,2', '0,2', '0,2', '0,2', ,'0,2','0,12', '4.', '4.', '4.', '0,2']}

我该如何获取这些值?

不幸的是,这段代码:

result = defaultdict(list)

for l in my_list:
    k = None
    for v in l:
        if v in keywords:
            k = v
        if re.match(r'[0-9,.]+$', v): 
            num = v
    if k is not None:
        result[k].append(num)

它没有给我预期的输出结果。

4个回答

1
问题出在你的正则表达式上。你需要移除$锚点,否则如果期望字符后面跟着任何东西(例如%字符),将无法匹配数字。代码的其余部分也可以简化一些:
import re
from collections import defaultdict

my_list=[['word:', 'house', 'garden', '0,2%'],
 ['word:', 'house', 'garden', '0,2%'],
 ['house', 'garden', '0,2%'],
 ['house', 'garden', '0,2%'],
 ['garden', '0,2%', '0,125%'],
 ['house', '0,2%', '?????'],
 ['house', 'garden', '0,02%'],
 ['house', 'garden', '0,02%'],
 ['garden', '0,02%'],
 ['house', 'garden', '0,2%'],
 ['garden', '0,2%'],
 ['house', '0,2'],
 ['house', '0,2', '%'],
 ['house', 'garden', 'kids', '0,2%'],
 ['house', 'garden', 'kids', '0,2%'],
 ['house', '0,2%', 'boy'],
 ['house', '0,12%'],
 ['house', '4%.'],
 ['house', '4%.', '4.'],
 ['house', '0,2%".']]

result = defaultdict(list)
keywords = ['house', 'garden']
for l in my_list:
    numbers = [v for v in l if re.match(r'[0-9,.]+', v)]
    for v in l:
        if v in keywords:
            result[v].extend(numbers)
print(result)

输出:

defaultdict(<class 'list'>, {'house': ['0,2%', '0,2%', '0,2%', '0,2%', '0,2%', '0,02%', '0,02%', '0,2%', '0,2', '0,2', '0,2%', '0,2%', '0,2%', '0,12%', '4%.', '4%.', '4.', '0,2%".'], 'garden': ['0,2%', '0,2%', '0,2%', '0,2%', '0,2%', '0,125%', '0,02%', '0,02%', '0,02%', '0,2%', '0,2%', '0,2%', '0,2%']})

0

以下是您可以做的:

my_list=[['word:', 'house', 'garden', '0,2%'],
         ['word:', 'house', 'garden', '0,2%'],
         ['house', 'garden', '0,2%'],
         ['house', 'garden', '0,2%'],
         ['garden', '0,2%', '0,125%'],
         ['house', '0,2%', '?????'],
         ['house', 'garden', '0,02%'],
         ['house', 'garden', '0,02%'],
         ['garden', '0,02%'],
         ['house', 'garden', '0,2%'],
         ['garden', '0,2%'],
         ['house', '0,2'],
         ['house', '0,2', '%'],
         ['house', 'garden', 'kids', '0,2%'],
         ['house', 'garden', 'kids', '0,2%'],
         ['house', '0,2%', 'boy'],
         ['house', '0,12%'],
         ['house', '4%.'],
         ['house', '4%.', '4.'],
         ['house', '0,2%”.']]


my_dict = { 'garden':[], 'house':[]}
for lst in my_list:
    for s in lst:
        if any([n in s for n in '1234567890']):
            if 'house' in lst:
                my_dict['house'].append(s.replace('%',''))
            if 'garden' in lst:
                my_dict['garden'].append(s.replace('%',''))
print(my_dict)

输出:

{'garden': ['0,2', '0,2', '0,2', '0,2', '0,2', '0,125', '0,02', '0,02', '0,02', '0,2', '0,2', '0,2', '0,2'], 'house': ['0,2', '0,2', '0,2', '0,2', '0,2', '0,02', '0,02', '0,2', '0,2', '0,2', '0,2', '0,2', '0,2', '0,12', '4.', '4.', '4.', '0,2”.']}

0
你可以尝试以下方法:
my_list=[['word:', 'house', 'garden', '0,2%'],
 ['word:', 'house', 'garden', '0,2%'],
 ['house', 'garden', '0,2%'],
 ['house', 'garden', '0,2%'],
 ['garden', '0,2%', '0,125%'],
 ['house', '0,2%', '?????'],
 ['house', 'garden', '0,02%'],
 ['house', 'garden', '0,02%'],
 ['garden', '0,02%'],
 ['house', 'garden', '0,2%'],
 ['garden', '0,2%'],
 ['house', '0,2'],
 ['house', '0,2', '%'],
 ['house', 'garden', 'kids', '0,2%'],
 ['house', 'garden', 'kids', '0,2%'],
 ['house', '0,2%', 'boy'],
 ['house', '0,12%'],
 ['house', '4%.'],
 ['house', '4%.', '4.'],
 ['house', '0,2%”.']]

house_list = []
garden_list = []

for arg in my_list: 
    if "house" in arg:
         for foo in arg:
              if foo != "house" and foo != "garden" and foo != "word:":
                   new_foo = ""
                   for char in foo:
                       if char in "1234567890,.":
                            new_foo += char 
                   if new_foo != "":
                        house_list.append(new_foo)
    if "garden" in arg:
          for foo in arg:
               if foo != "house" and foo != "garden" and foo != "word:":
                    new_foo = ""
                    for char in foo:
                         if char in "1234567890,.":
                              new_foo += char 
                    if new_foo != "":
                         garden_list.append(new_foo) 

output = {"house": house_list, "garden": garden_list}
print(output)

0

我认为更简单的方法可能是这样的:

my_list=[['word:', 'house', 'garden', '0,2%'],
 ['word:', 'house', 'garden', '0,2%'],
 ['house', 'garden', '0,2%'],
 ['house', 'garden', '0,2%'],
 ['garden', '0,2%', '0,125%'],
 ['house', '0,2%', '?????'],
 ['house', 'garden', '0,02%'],
 ['house', 'garden', '0,02%'],
 ['garden', '0,02%'],
 ['house', 'garden', '0,2%'],
 ['garden', '0,2%'],
 ['house', '0,2'],
 ['house', '0,2', '%'],
 ['house', 'garden', 'kids', '0,2%'],
 ['house', 'garden', 'kids', '0,2%'],
 ['house', '0,2%', 'boy'],
 ['house', '0,12%'],
 ['house', '4%.'],
 ['house', '4%.', '4.'],
 ['house', '0,2%”.']]


output = {'garden': [],
          'house': []}
for line in my_list:
    for keyword in output.keys():
        if keyword in line:
            for element in line:
                if element[0].isnumeric():
                    output[keyword].append(element)

输出:

{'garden': ['0,2%', '0,2%', '0,2%', '0,2%', '0,2%', '0,125%', '0,02%', '0,02%', '0,02%', '0,2%', '0,2%', '0,2%', '0,2%'], 'house': ['0,2%', '0,2%', '0,2%', '0,2%', '0,2%', '0,02%', '0,02%', '0,2%', '0,2', '0,2', '0,2%', '0,2%', '0,2%', '0,12%', '4%.', '4%.', '4.', '0,2%”.']}

解释: 我循环遍历你的列表中的列表(line),然后循环遍历所需的关键字('garden''house'),然后检查该keyword是否在line中,如果是,则循环遍历lineelement以找到以数字开头的element(这是我的假设),如果是,则将其附加到output字典中。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接