使用pandas按第n周对列表项进行分组

4
我有接下来10天的一些数据。
[{'cover_image': 'TODO - s3 link', 'epoch': 1497403800000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497490200000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497576600000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497663000000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497749400000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497835800000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497922200000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1498008600000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1498095000000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1498181400000}]

使用周数,我想将数据分组为本周下周

我需要像这样的东西,

{
    '24': [# list of items for this week],
    '25': [# list of items for next week]
}
# i.e.
{'24': [{'cover_image': 'TODO - s3 link', 'epoch': 1497403800000},
  {'cover_image': 'TODO - s3 link', 'epoch': 1497490200000},
  {'cover_image': 'TODO - s3 link', 'epoch': 1497576600000},
  {'cover_image': 'TODO - s3 link', 'epoch': 1497663000000},
  {'cover_image': 'TODO - s3 link', 'epoch': 1497749400000}],
'25': [{'cover_image': 'TODO - s3 link', 'epoch': 1497835800000},
  {'cover_image': 'TODO - s3 link', 'epoch': 1497922200000},
  {'cover_image': 'TODO - s3 link', 'epoch': 1498008600000},
  {'cover_image': 'TODO - s3 link', 'epoch': 1498095000000},
  {'cover_image': 'TODO - s3 link', 'epoch': 1498181400000}]
}

使用 pandas 库,我尝试进行如下操作

In [89]: df = pandas.DataFrame(data)

In [90]: df.index = pandas.to_datetime(df['epoch'], unit='ms')

In [103]: df['label'] = df.index.week

In [104]: df
Out[104]: 
                        cover_image          epoch  label
epoch                                                    
2017-06-14 01:30:00  TODO - s3 link  1497403800000     24
2017-06-15 01:30:00  TODO - s3 link  1497490200000     24
2017-06-16 01:30:00  TODO - s3 link  1497576600000     24
2017-06-17 01:30:00  TODO - s3 link  1497663000000     24
2017-06-18 01:30:00  TODO - s3 link  1497749400000     24
2017-06-19 01:30:00  TODO - s3 link  1497835800000     25
2017-06-20 01:30:00  TODO - s3 link  1497922200000     25
2017-06-21 01:30:00  TODO - s3 link  1498008600000     25
2017-06-22 01:30:00  TODO - s3 link  1498095000000     25
2017-06-23 01:30:00  TODO - s3 link  1498181400000     25

In [106]: df.groupby('label').groups
Out[106]: 
{24: DatetimeIndex(['2017-06-14 01:30:00', '2017-06-15 01:30:00',
                '2017-06-16 01:30:00', '2017-06-17 01:30:00',
                '2017-06-18 01:30:00'],
               dtype='datetime64[ns]', name=u'epoch', freq=None),
 25: DatetimeIndex(['2017-06-19 01:30:00', '2017-06-20 01:30:00',
                '2017-06-21 01:30:00', '2017-06-22 01:30:00',
                '2017-06-23 01:30:00'],
               dtype='datetime64[ns]', name=u'epoch', freq=None)}

由于我对 pandas 的了解有限,所以无法深入了解。

如果能将周数键更改为 this_week、next_week 和 future,那就太棒了。

请帮忙解决问题。

1个回答

3

看起来你需要:

df = pd.DataFrame(data)
df.index = pd.to_datetime(df['epoch'], unit='ms')

d = dict(tuple(df.groupby(df.index.week)))

print (d[24])
                        cover_image          epoch
epoch                                             
2017-06-14 01:30:00  TODO - s3 link  1497403800000
2017-06-15 01:30:00  TODO - s3 link  1497490200000
2017-06-16 01:30:00  TODO - s3 link  1497576600000
2017-06-17 01:30:00  TODO - s3 link  1497663000000
2017-06-18 01:30:00  TODO - s3 link  1497749400000

编辑:

data = [{'cover_image': 'TODO - s3 link', 'epoch': 1497403800000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497490200000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497576600000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497663000000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497749400000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497835800000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1497922200000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1498008600000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1498895000000},
 {'cover_image': 'TODO - s3 link', 'epoch': 1499881400000}]

df = pd.DataFrame(data)
df.index = pd.to_datetime(df['epoch'], unit='ms')
print (df)
                        cover_image          epoch
epoch                                             
2017-06-14 01:30:00  TODO - s3 link  1497403800000
2017-06-15 01:30:00  TODO - s3 link  1497490200000
2017-06-16 01:30:00  TODO - s3 link  1497576600000
2017-06-17 01:30:00  TODO - s3 link  1497663000000
2017-06-18 01:30:00  TODO - s3 link  1497749400000
2017-06-19 01:30:00  TODO - s3 link  1497835800000
2017-06-20 01:30:00  TODO - s3 link  1497922200000
2017-06-21 01:30:00  TODO - s3 link  1498008600000
2017-07-01 07:43:20  TODO - s3 link  1498895000000
2017-07-12 17:43:20  TODO - s3 link  1499881400000

now = pd.datetime.now()
print (now)
2017-06-14 09:45:25.371940

weeks = df.index.week
this_week = now.isocalendar()[1]
next_week = (now + pd.Timedelta(7, unit='d')).isocalendar()[1]
map_d = {x:'future' for x in weeks.unique() if x not in [this_week, next_week]}
map_d[this_week] = 'this_week'
map_d[next_week] = 'next_week'
print (map_d)
{24: 'this_week', 25: 'next_week', 26: 'future', 28: 'future'}

d = dict(tuple(df.groupby([map_d[x] for x in weeks])))

print (d['next_week'])
                        cover_image          epoch
epoch                                             
2017-06-19 01:30:00  TODO - s3 link  1497835800000
2017-06-20 01:30:00  TODO - s3 link  1497922200000
2017-06-21 01:30:00  TODO - s3 link  1498008600000

d = {k:v.to_dict(orient='records') for k, v in df.groupby([map_d[x] for x in weeks])}
print (d)
{'future': [{'cover_image': 'TODO - s3 link', 'epoch': 1498895000000}, 
            {'cover_image': 'TODO - s3 link', 'epoch': 1499881400000}], 
'next_week': [{'cover_image': 'TODO - s3 link', 'epoch': 1497835800000}, 
              {'cover_image': 'TODO - s3 link', 'epoch': 1497922200000}, 
              {'cover_image': 'TODO - s3 link', 'epoch': 1498008600000}], 
 'this_week': [{'cover_image': 'TODO - s3 link', 'epoch': 1497403800000}, 
               {'cover_image': 'TODO - s3 link', 'epoch': 1497490200000}, 
               {'cover_image': 'TODO - s3 link', 'epoch': 1497576600000}, 
               {'cover_image': 'TODO - s3 link', 'epoch': 1497663000000}, 
               {'cover_image': 'TODO - s3 link', 'epoch': 1497749400000}]}

你可以查看groups - 组名 -> 组标签(索引值) - jezrael
另外,您可以查看此链接 - jezrael
d[24] 是字符串,我该如何获取实际的元组? - Hussain
我不确定你的想法是什么 - 你能多解释一下吗?期望的输入和输出是什么? - jezrael
谢谢。现在明白了。给我一秒钟。 - jezrael
显示剩余9条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接