过滤一个包含列表的字典

21

我有一个这样的字典:

{"level": [1, 2, 3],
 "conf": [-1, 1, 2],
 "text": ["here", "hel", "llo"]}

我希望筛选列表以删除每个索引处的项目i,其中值"conf"中的索引不是>0。

因此,对于上述dict,输出应为:

{"level": [2, 3],
 "conf": [1, 2],
 "text": ["hel", "llo"]}

由于conf的第一个值不大于0。

我尝试了类似这样的方法:

new_dict = {i: [a for a in j if a >= min_conf] for i, j in my_dict.items()}

但是这只适用于一个密钥。


1
https://numpy.org/doc/stable/user/absolute_beginners.html#whats-the-difference-between-a-python-list-and-a-numpy-array - Nate T
11个回答

15

尝试:

from operator import itemgetter


def filter_dictionary(d):
    positive_indices = [i for i, item in enumerate(d['conf']) if item > 0]
    f = itemgetter(*positive_indices)
    return {k: list(f(v)) for k, v in d.items()}


d = {"level": [1, 2, 3], "conf": [-1, 1, 2], "text": ["-1", "hel", "llo"]}
print(filter_dictionary(d))

输出:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

我先尝试看哪些 'conf' 的索引是正数,然后用 itemgetter 从字典内的值中选择这些索引。

更紧凑的版本 + 不使用临时列表,而是使用生成器表达式:


def filter_dictionary(d):
    f = itemgetter(*(i for i, item in enumerate(d['conf']) if item > 0))
    return {k: list(f(v)) for k, v in d.items()}

11
这是一个一行代码:
dct = {k: [x for i, x in enumerate(v) if d['conf'][i] > 0] for k, v in d.items()}

输出:

>>> dct
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

使用示例数据:

d = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]

10

我会使用以下方式来保存有效元素的索引(即大于0的元素):

kept_keys = [i for i in range(len(my_dict['conf'])) if my_dict['conf'][i] > 0]

然后你可以过滤每个列表,检查列表中某个元素的索引是否包含在kept_keys中:

{k: list(map(lambda x: x[1], filter(lambda x: x[0] in kept_keys, enumerate(my_dict[k])))) for k in my_dict}

输出:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

5
您所描述的数据结构似乎更适合建模为pandas DataFrame:您基本上将数据视为二维网格,并且希望根据一列中的值过滤出该网格的行。
以下代码片段将使用DataFrame作为中间表示来完成您所需的操作:
import pandas as pd

data = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]}
df = pd.DataFrame(data)
df = df.loc[df["conf"] > 0]
result = df.to_dict(orient="list")

输出:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

然而,需要注意的是,如果您最初将数据表示为DataFrame,并在完成后保持该形式,那么这将变得更加简化:
data = pd.DataFrame({
    "level":[1,2,3],
    "conf":[-1,1,2],
    "text":["here","hel","llo"],
})

result = data.loc[data["conf"] > 0]

输出:

   level  conf text
1      2     1  hel
2      3     2  llo

比任何“纯字典”解决方案更简洁、更具表现力且(在大输入情况下)性能更高。
如果您希望在此数据上执行的其他操作类似(在“2D数组”操作意义上),那么很可能它们也更自然地以DataFrame为基础表达,因此保持数据作为DataFrame可能比转换回字典更有优势。

1
也许这只是一个小问题,但是“首先将您的数据表示为DataFrame”似乎是一个误导。如果数据是从第三方函数返回的,或者数据非常大,那么将数据内联到DataFrame调用中就没有意义了。重要的是将数据放入df中,而不管它如何到达那里。 - wjandrea

4
我用以下方法解决了这个问题:
from typing import Dict, List, Any, Set

d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1", "hel", "llo"]}

# First, we create a set that stores the indices which should be kept.
# I chose a set instead of a list because it has a O(1) lookup time.
# We only want to keep the items on indices where the value in d["conf"] is greater than 0
filtered_indexes = {i for i, value in enumerate(d.get('conf', [])) if value > 0}

def filter_dictionary(d: Dict[str, List[Any]], filtered_indexes: Set[int]) -> Dict[str, List[Any]]:
    filtered_dictionary = d.copy()  # We'll return a modified copy of the original dictionary
    for key, list_values in d.items():
        # In the next line the actual filtering for each key/value pair takes place. 
        # The original lists get overwritten with the filtered lists.
        filtered_dictionary[key] = [value for i, value in enumerate(list_values) if i in filtered_indexes]
    return filtered_dictionary

print(filter_dictionary(d, filtered_indexes))

输出:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

3

试试这个,简单易懂,特别适合初学者:

a_dict = {"level": [1, 2, 3, 4, 5, 8], "conf": [-1, 1, -1, -2], "text": ["-1", "hel", "llo", "ai", 0, 9]}

# iterate backwards over the list keeping the indexes
for index, item in reversed(list(enumerate(a_dict["conf"]))):
    if item <= 0:
        for lists in a_dict.values():
            del lists[index]
print(a_dict)

输出:

{'level': [2, 5, 8], 'conf': [1], 'text': ['hel', 0, 9]}

不错,这很聪明!其他答案先建立索引列表,但一个一个地挑选出来会更简单。虽然可能性能较差,但更易于理解。 - wjandrea

3
a = {"level":[1,2,3,4], "conf": [-1,1,2,-1],"text": ["-1","hel","llo","test"]}

# inefficient solution
# for k, v in a.items():
#     if k == "conf":
#         start_search = 0
#         to_delete = [] #it will store the index numbers of the conf that you want to delete(conf<0)
#         for element in v:
#             if element < 0:
#                 to_delete.append(v.index(element,start_search))
#                 start_search = v.index(element) + 1

#more efficient and elegant solution
to_delete = [i for i, element in enumerate(a["conf"]) if element < 0]
for position in list(reversed(to_delete)):
    for k, v in a.items():
        v.pop(position)

结果将会是

>>> a
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

3

很多好的答案。这里是另一种2次遍历的方法:

mydict = {"level": [1, 2, 3], "conf": [-1, 1, 2], 'text': ["-1", "hel", "llo"]}

for i, v in enumerate(mydict['conf']):
    if v <= 0:
        for key in mydict.keys():
            mydict[key][i] = None

for key in mydict.keys():
    mydict[key] = [v for v in mydict[key] if v is not None]

print(mydict)

输出:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

最好加上一些关于这段代码如何工作的解释。只需要一两句话就可以了,比如“获取每个负面‘conf’值的索引,将其设置为整个字典中的‘None’,然后过滤掉它们。” - wjandrea

3
这是一个使用numpy的方法:
dct = {"level":[1,2,3], "conf":[-1,1,2], "text":["here","hel","llo"]}
dct = {k: np.array(v) for k, v in d.items()}
dct = {k: v[a['conf'] > 0].tolist() for k, v in a.items()}

输出:

>>> dct
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

3
你可以编写一个函数来计算需要保留哪些索引,并使用这些索引对每个列表进行重组:
my_dict = {"level":[1,2,3], "conf":[-1,1,2],'text':["-1","hel","llo"]}

def remove_corresponding_items(d, key):
    keep_indexes = [idx for idx, value in enumerate(d[key]) if value>0]
    for key, lst in d.items():
        d[key] = [lst[idx] for idx in keep_indexes]

remove_corresponding_items(my_dict, 'conf')
print(my_dict)

按照要求输出


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接