有超过2百万个数组可供使用,我立即注意到了Warren Weckesser的solution和Tonsic的ones之间的巨大差异(非常感谢两位)
使用
first_array
[out]
array([(1633046400299000, 1.34707, 1.34748),
(1633046400309000, 1.347 , 1.34748),
(1633046400923000, 1.347 , 1.34749), ...,
(1635551693846000, 1.36931, 1.36958),
(1635551693954000, 1.36925, 1.36952),
(1635551697902000, 1.3692 , 1.36947)],
dtype=[('timestamp', '<i8'), ('bid', '<f8'), ('ask', '<f8')])
并且
second_array
[out]
array([('2021-10-01T00:00:00.299000',), ('2021-10-01T00:00:00.309000',),
('2021-10-01T00:00:00.923000',), ...,
('2021-10-29T23:54:53.846000',), ('2021-10-29T23:54:53.954000',),
('2021-10-29T23:54:57.902000',)], dtype=[('date_time', '<M8[us]')])
我得到了
%timeit rfn.merge_arrays((first_array, second_array), flatten=True)
[out]
13.8 s ± 1.11 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
并且
[out]
2.12 s ± 146 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
更好(并注意在结尾处加上.data
以避免获取mask
和fill_value
)
而使用类似下面的内容则不会这样
def building_new(first_array, other_array):
new_array = np.zeros(
first_array.size,
dtype=[('timestamp', '<i8'), ('bid', '<f8'), ('ask', '<f8'), ('date_time', '<M8[us]')])
new_array[['timestamp', 'bid', 'ask']] = first_array[['timestamp', 'bid', 'ask']]
new_array['date_time'] = other_array
return new_array
(请注意,在结构化数组中,每一行都是一个元组,因此大小很好用)
我得到
%timeit building_new(first_array, second_array)
[out]
67.2 ms ± 3.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
所有三个的输出结果都是相同的
[out]
array([(1633046400299000, 1.34707, 1.34748, '2021-10-01T00:00:00.299000'),
(1633046400309000, 1.347 , 1.34748, '2021-10-01T00:00:00.309000'),
(1633046400923000, 1.347 , 1.34749, '2021-10-01T00:00:00.923000'),
...,
(1635551693846000, 1.36931, 1.36958, '2021-10-29T23:54:53.846000'),
(1635551693954000, 1.36925, 1.36952, '2021-10-29T23:54:53.954000'),
(1635551697902000, 1.3692 , 1.36947, '2021-10-29T23:54:57.902000')],
dtype=[('timestamp', '<i8'), ('bid', '<f8'), ('ask', '<f8'), ('date_time', '<M8[us]')])
最后一点想法:
创建新数组而不是使用recfunctions,第二个数组甚至不需要是结构化的。
third_array
[out]
array(['2021-10-01T00:00:00.299000', '2021-10-01T00:00:00.309000',
'2021-10-01T00:00:00.923000', ..., '2021-10-29T23:54:53.846000',
'2021-10-29T23:54:53.954000', '2021-10-29T23:54:57.902000'],
dtype='datetime64[us]')
[out]
67 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)