查找嵌套列表中nan元素的索引并将其删除

6
names=[['Pat','Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values=[[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]

我有两个列表:namesvalues。每个值都与一个名称对应,即Pat对应值1Sam对应值9

我想从names中删除nan以及从values中删除相应的值。

也就是说,我想要一个看起来像这样的new_names列表:

[['Pat','Sam', 'Tom', ''], ["Angela", "James", ".", "Jackie"]]

还有一个名为new_values的列表,它看起来像这样:

[[1, 9, 2, 1], [1, 1, 5, 10]]

我的尝试是首先找到这些nan条目的索引:

all_nan_idx = []
for idx, name in enumerate(names):
  if pd.isnull(name):
  all_nan_idx.append(idx)

然而,上述内容并未考虑嵌套列表。

7个回答

2

就这样?

import numpy as np
import pandas as pd

names=[['Pat','Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values=[[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]

new_names = []
new_values = []
for names_, values_ in zip(names, values):
    n = []
    v = []
    for name, value in zip(names_, values_):
        if not pd.isnull(name):
            n.append(name)
            v.append(value)
    new_names.append(n)
    new_values.append(v)

2

可能有一种难以理解的方法可以做到这一点,但以下是一种逐步的方法:

import numpy as np

names = [
    ['Pat', 'Sam', np.nan, 'Tom', ''],
    ["Angela", np.nan, "James", ".", "Jackie"]
    ]
values = [
    [1, 9, 1, 2, 1],
    [1, 3, 1, 5, 10]
    ]

new_names = []
new_values = []

for nn, vv in zip(names, values):
    new_names.append([])
    new_values.append([])
    for n, v in zip(nn, vv):
        if not n is np.nan:
            new_names[-1].append(n)
            new_values[-1].append(v)


print(new_names)
print(new_values)

输出:

[['Pat', 'Sam', 'Tom', ''], ['Angela', 'James', '.', 'Jackie']]
[[1, 9, 2, 1], [1, 1, 5, 10]]

1
使用递归函数:
import numpy as np

def filter_nan(names, values):
  new_names, new_values = [], []

  for name, value in zip(names, values, strict=True):
    if name is np.nan:
      continue

    if isinstance(name, list) and isinstance(value, list):
      name, value = filter_nan(name, value)

      new_names.append(name)
      new_values.append(value)

  return new_names, new_values

试一下:

names = [['Pat', 'Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values = [[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]

print(filter_nan(names, values))

'''
(
  [['Pat', 'Sam', 'Tom', ''], ['Angela', 'James', '.', 'Jackie']],
  [[1, 9, 2, 1], [1, 1, 5, 10]]
)
'''

1

也许有点过头了,但这是另一个选择:

import numpy as np

names = [['Pat', 'Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values = [[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]

new_names = []
new_values = []

for aux_list in zip(names, values):
    filtered_names, filtered_values = zip(*filter(lambda x: x[0] is not np.nan, zip(*aux_list)))
    new_names.append(list(filtered_names))
    new_values.append(list(filtered_values))

1

这是更好、更简单的方式来处理这种情况。

import numpy as np

names=[['Pat','Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values=[[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]

new_names = []
new_values = []

for i in range(len(names)):
    new_names.append([])
    new_values.append([])
    for j in range(len(names[i])):
        if not isinstance(names[i][j], float):
            new_names[i].append(names[i][j])
            new_values[i].append(values[i][j])
            
print(new_names)
print(new_values)




1
这是一个使用pandas的解决方案:
import pandas as pd


result = []
for n, v in zip(names, values):
    n = pd.Series(n).dropna()
    result.append((n.tolist(), pd.Series(v).loc[n.index].tolist()))

names, values = map(list, zip(*result))

如果你使用的是 Python >= 3.8,你也可以使用一行代码:

import pandas as pd

names, values = map(list, zip(*(
    ((s := pd.Series(n).dropna()).tolist(), pd.Series(v).loc[s.index].tolist())
    for n, v in zip(names, values)
)))

1
为了高效地使用一条语句来实现此操作,您可以将输入列表转置为名称-值对的序列,以便您可以使用生成器表达式过滤掉空名称,然后再将它们转置回两个列表。
new_names, new_values = map(list, zip(*(
    map(list, zip(*(
        (name, value)
        for name, value in zip(*pairs)
        if not pd.isnull(name)
    )))
    for pairs in zip(names, values)
)))

演示:https://replit.com/@blhsing/EnormousHarshFreesoftware#main.py

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接