Pandas:从多个列创建元组列

4

我有以下数据框 my_df

Person       event         time
---------------------------------
John          A        2017-10-11
John          B        2017-10-12
John          C        2017-10-14
John          D        2017-10-15
Ann           X        2017-09-01
Ann           Y        2017-09-02
Dave          M        2017-10-05
Dave          N        2017-10-07
Dave          Q        2017-10-20

我想创建一个新的列,其中包含(事件,时间)对。它应该像这样:

Person       event         time        event_time
------------------------------------------------------
John          A        2017-10-11     (A, 2017-10-11)
John          B        2017-10-12     (B, 2017-10-12)
John          C        2017-10-14     (C, 2017-10-14)
John          D        2017-10-15     (D, 2017-10-15)
Ann           X        2017-09-01     (X, 2017-09-01)
Ann           Y        2017-09-02     (Y, 2017-09-02)
Dave          M        2017-10-05     (M, 2017-10-05)
Dave          N        2017-10-07     (N, 2017-10-07)
Dave          Q        2017-10-20     (Q, 2017-10-20)

这是我的程序代码:

my_df['event_time'] = my_df.apply(lambda row: (row['event'] , row['time']), axis=1)

但是我收到了以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4309         blocks = form_blocks(arrays, names, axes)
-> 4310         mgr = BlockManager(blocks, axes)
   4311         mgr._consolidate_inplace()

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2794         if do_integrity_check:
-> 2795             self._verify_integrity()
   2796 

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in _verify_integrity(self)
   3005             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3006                 construction_error(tot_items, block.shape[1:], self.axes)
   3007         if len(self.items) != tot_items:

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 

ValueError: Shape of passed values is (128, 2), indices imply (128, 3)

任何想法我在代码中做错了什么吗?谢谢!
3个回答

6

您可以使用:

my_df['event_time'] = my_df[['event','time']].apply(tuple, axis=1)

或者:

my_df['event_time'] = tuple(zip(my_df['event'], my_df['time']))

或者:

my_df['event_time'] = [tuple(x) for x in my_df[['event','time']].values.tolist()]

所有返回值:

print (my_df)
  Person event        time       event_time
0   John     A  2017-10-11  (A, 2017-10-11)
1   John     B  2017-10-12  (B, 2017-10-12)
2   John     C  2017-10-14  (C, 2017-10-14)
3   John     D  2017-10-15  (D, 2017-10-15)
4    Ann     X  2017-09-01  (X, 2017-09-01)
5    Ann     Y  2017-09-02  (Y, 2017-09-02)
6   Dave     M  2017-10-05  (M, 2017-10-05)
7   Dave     N  2017-10-07  (N, 2017-10-07)
8   Dave     Q  2017-10-20  (Q, 2017-10-20)

{btsdaf} - Edamame
{btsdaf} - jezrael
是的,我有一些事件被标记为'None',但仍然带有时间戳。我希望相应的元组可以是(None,时间戳)。 - Edamame

2
没有 apply
df.assign(event_time=list(zip(df.event,df.time)))
Out[1011]: 
  Person event        time        event_time
0   John     A  2017-10-11  (A, 2017-10-11)
1   John     B  2017-10-12  (B, 2017-10-12)
2   John     C  2017-10-14  (C, 2017-10-14)
3   John     D  2017-10-15  (D, 2017-10-15)
4    Ann     X  2017-09-01  (X, 2017-09-01)
5    Ann     Y  2017-09-02  (Y, 2017-09-02)
6   Dave     M  2017-10-05  (M, 2017-10-05)
7   Dave     N  2017-10-07  (N, 2017-10-07)
8   Dave     Q  2017-10-20  (Q, 2017-10-20)

0
my_df['event_time'] = my_df.apply(lambda x: tuple(x[['event','time']]),axis = 1)

如果你想要使用lambda来提高运行效率,这将是我的方法。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接