在字典列表中添加缺失值

Question

在字典列表中添加缺失值

3

我有一个类似这样的列表：

[
{
    "timeline": "2014-10", 
    "total_prescriptions": 17
}, 
{
    "timeline": "2014-11", 
    "total_prescriptions": 14
}, 
{
    "timeline": "2014-12", 
    "total_prescriptions": 8
{
    "timeline": "2015-1", 
    "total_prescriptions": 4
}, 
{
    "timeline": "2015-3", 
    "total_prescriptions": 10
}, 
{
    "timeline": "2015-4", 
    "total_prescriptions": 3
} 
]

这基本上是Django中原始SQL查询的输出，它按月计算total_prescriptions并按升序排列数据。然而，MYSQL count的性质是对于null值不会返回0。因此，二月份完全被跳过，而不是有一个total_prescriptions等于0的条目。

我打算在Python中遍历列表，并手动添加所有缺失月份的total_prescriptions = 0，以便输出看起来像这样：

[
{
    "timeline": "2014-10", 
    "total_prescriptions": 17
}, 
{
    "timeline": "2014-11", 
    "total_prescriptions": 14
}, 
{
    "timeline": "2014-12", 
    "total_prescriptions": 8
{
    "timeline": "2015-1", 
    "total_prescriptions": 4
}, 
{
    "timeline": "2015-2", 
    "total_prescriptions": 0
}, 
{
    "timeline": "2015-3", 
    "total_prescriptions": 10
}, 
{
    "timeline": "2015-4", 
    "total_prescriptions": 3
} 
]

我该如何做到这一点？

- Amistad

我什么时候说过我有一个字典的字典了？ - Amistad

哦，抱歉，已更正。 - Amistad

5个回答

1

保留已知月份编号的参考列表，并获取该参考列表中所有在提取数据中不存在的元素列表。在下面的代码中，dc 是您的字典列表。

ref = [str(x) + '-' + str(i) for x in range(min_year, max_year+1) for i in range(1,13)]
missing_timelines = [r for r in ref if r not in [i['timeline'] for i in dc]]
for m in missing_timelines:
  dc.append({"timeline": m, "total_prescriptions": 0})

Mic4ael的想法也很好。理想情况下，您应该在数据库中处理这个问题。

- saarrrr

问题在于年份范围不固定。它不仅限于2014年至2016年，可能是任意两年。 - Amistad

但它总是两个吗？必须是两个吗？可以是4个吗？ - saarrrr

是的，这完全取决于查询中的起始日期和结束日期。如果查询从2009年到2015年，则会检索从2009年到2015年所有月份的处方总数。因此，可能有任意多年，但总是按照日期升序排列（如果有帮助的话）。 - Amistad

用min_year和max_year + 1替换2014和2016。确保这些值被保存到变量中并传递给修复数据库返回数据的函数。 - saarrrr

saarrr..这很不错..但我最终使用了Pandas..Pandas证明更快，并且有时间线抽样，正好符合我的需求.. - Amistad

1

我最终使用Pandas来解决这个问题，因为它对于较大的数据集显然更快，并以一种漂亮优雅的方式实现了这一点。在Pandas中，这被称为“重新采样”；首先将您的时间转换为numpy日期时间并设置为索引：

>>> import pandas as pd
>>> df = pd.DataFrame(L) #where L is my list of dictionaries
>>> df.index=pd.to_datetime(df.timeline,format='%Y-%m')
>>> df
timeline    timeline            total_prescriptions                            
2014-10-01  2014-10                   17
2014-11-01  2014-11                   14
2014-12-01  2014-12                    8
2015-01-01  2015-1                     4
2015-03-01  2015-3                    10
2015-04-01  2015-4                     3

然后，您可以使用resample('MS')添加缺失的月份，并使用fillna(0)将空值转换为零：

>>> df = df.resample('MS').fillna(0)
>>> df         
timeline                total_prescriptions
2014-10-01                   17
2014-11-01                   14
2014-12-01                    8
2015-01-01                    4
2015-02-01                    0
2015-03-01                   10
2015-04-01                    3

- Amistad

0

我不确定这段代码的效率，但这可能会对你有所帮助。 试试这个：

presub = {...}
before = int(presub[0]["timeline"].split("-")[1])
for x in presub:
  year = int(presub[x]["timeline"].split("-")[0])
  month = int(presub[x]["timeline"].split("-")[1])
  if before+1 != month:
    if before+1%12 != 0:
      year = year+1
    presub.append({"timeline":str(year)+"-"+str((before+1) % 12, "total_prescriptions":0)})

presub = sorted(presub, key=lambda k: k['timeline'])

- Guy Goldenberg

0

我改变了我的方法。你要开始使用的列表是my_list

def getDate(entry):
    """
    Given a list entry dict, return a tuple of ints:
    (year, month)
    """
    date = entry['timeline']
    i = date.index('-')
    month = int(date[i + 1:])
    year = int(date[:4])
    return (year, month)

def supplyMissing(year, month, n):
    """
    Given a year, month, & number of missing entries (ints),
    return a list of entries (dicts)
    """
    entries = []
    for e in range(n):
        if month == 12:
            year += 1
            month = 1
        else:
            month += 1
        entries.append({'timeline': str(year) + '-' + str(month),
                        'total_prescriptions': 0})
    return entries

# Make a copy of the list to work with:
new_list = list(my_list)

# Track the number of times corrections are made
c_count = 0

# Iterate over the list
for i in range(len(my_list) - 1):
    entry = my_list[i]
    next_entry = my_list[i + 1]
    year, month = getDate(entry)
    next_year, next_month = getDate(next_entry)

    if ((next_year == year and next_month == month + 1) or
        (next_year == year + 1 and next_month == month - 11)):
        pass
    # If entries are not sequential, determine what to add.
    else:
        # How many months are missing?
        if next_year == year:
            missing_months = next_month - month - 1
        else:
            dif_years = next_year - year
            missing_months = 12 * dif_years + next_month - month - 1

        # Generate missing entries
        missing_entries = supplyMissing(year, month, missing_months)

        # Insert missing entries into the temporary list.
        for m in range(missing_months):
            new_list.insert(i + 1 + m + c_count, missing_entries[m])
        c_count += 1

# Finalize the result
my_list = new_list

- Jonathan Clede

不能假设起始年份和结束年份为2014年和2015年。可能是任意两年。 - Amistad

更改了处理任何列表的方法，无论起始/结束日期为何。 - Jonathan Clede

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mic4ael · Accepted Answer

2

也许你可以使用 COALESCE 在出现 NULL 值时返回 0。

- mic4ael

尝试使用COALESCE函数，但似乎它甚至不返回NULL值..它只是完全省略了该行..没有起作用.. - Amistad