我希望在Python中创建一个高效的循环缓冲区(目标是对缓冲区中的整数值取平均值)。
使用列表收集值是否是高效的方法?
def add_to_buffer( self, num ):
self.mylist.pop( 0 )
self.mylist.append( num )
什么方法更有效(为什么)?
我希望在Python中创建一个高效的循环缓冲区(目标是对缓冲区中的整数值取平均值)。
使用列表收集值是否是高效的方法?
def add_to_buffer( self, num ):
self.mylist.pop( 0 )
self.mylist.append( num )
什么方法更有效(为什么)?
我会使用具有 maxlen
参数的 collections.deque
>>> import collections
>>> d = collections.deque(maxlen=10)
>>> d
deque([], maxlen=10)
>>> for i in xrange(20):
... d.append(i)
...
>>> d
deque([10, 11, 12, 13, 14, 15, 16, 17, 18, 19], maxlen=10)
在文档中有一个关于deque
的示例,与您的要求类似。我认为它是最有效的主要原因是它是由一支非常熟练的团队用C语言实现的,他们总是能够写出一流的代码。
maxlen
后可以 O(1) 随机访问。如果 deque
可以无限增长,则 O(n) 是可以理解的,但是如果给定了 maxlen
,索引元素应该是常量时间。 - lvellalist
、collections.deque
和Numpy.roll
的缓冲区之间切换。请注意,update
方法每次只添加一个值,以保持简单。import numpy
import timeit
import collections
class CircularBuffer(object):
buffer_methods = ('list', 'deque', 'roll')
def __init__(self, buffer_size, buffer_method):
self.content = None
self.size = buffer_size
self.method = buffer_method
def update(self, scalar):
if self.method == self.buffer_methods[0]:
# Use list
try:
self.content.append(scalar)
self.content.pop(0)
except AttributeError:
self.content = [0.] * self.size
elif self.method == self.buffer_methods[1]:
# Use collections.deque
try:
self.content.append(scalar)
except AttributeError:
self.content = collections.deque([0.] * self.size,
maxlen=self.size)
elif self.method == self.buffer_methods[2]:
# Use Numpy.roll
try:
self.content = numpy.roll(self.content, -1)
self.content[-1] = scalar
except IndexError:
self.content = numpy.zeros(self.size, dtype=float)
# Testing and Timing
circular_buffer_size = 100
circular_buffers = [CircularBuffer(buffer_size=circular_buffer_size,
buffer_method=method)
for method in CircularBuffer.buffer_methods]
timeit_iterations = 1e4
timeit_setup = 'from __main__ import circular_buffers'
timeit_results = []
for i, cb in enumerate(circular_buffers):
# We add a convenient number of convenient values (see equality test below)
code = '[circular_buffers[{}].update(float(j)) for j in range({})]'.format(
i, circular_buffer_size)
# Testing
eval(code)
buffer_content = [item for item in cb.content]
assert buffer_content == range(circular_buffer_size)
# Timing
timeit_results.append(
timeit.timeit(code, setup=timeit_setup, number=int(timeit_iterations)))
print '{}: total {:.2f}s ({:.2f}ms per iteration)'.format(
cb.method, timeit_results[-1],
timeit_results[-1] / timeit_iterations * 1e3)
在我的系统上,这会产生以下结果:list: total 1.06s (0.11ms per iteration)
deque: total 0.87s (0.09ms per iteration)
roll: total 6.27s (0.63ms per iteration)
从列表的头部进行弹出操作会导致整个列表被复制,因此效率较低。
相反,您应该使用固定大小的列表/数组和一个索引,在添加/删除项目时移动缓冲区。
根据MoonCactus的回答,这里有一个circularlist
类。与他的版本不同的是,在这里c[0]
将始终给出最早添加的元素,c[-1]
将给出最新添加的元素,c[-2]
将给出倒数第二个...这对应用程序更自然。
c = circularlist(4)
c.append(1); print(c, c[0], c[-1]) #[1] (1/4 items) 1 1
c.append(2); print(c, c[0], c[-1]) #[1, 2] (2/4 items) 1 2
c.append(3); print(c, c[0], c[-1]) #[1, 2, 3] (3/4 items) 1 3
c.append(8); print(c, c[0], c[-1]) #[1, 2, 3, 8] (4/4 items) 1 8
c.append(10); print(c, c[0], c[-1]) #[2, 3, 8, 10] (4/4 items) 2 10
c.append(11); print(c, c[0], c[-1]) #[3, 8, 10, 11] (4/4 items) 3 11
d = circularlist(4, [1, 2, 3, 4, 5]) #[2, 3, 4, 5]
类:
class circularlist(object):
def __init__(self, size, data = []):
"""Initialization"""
self.index = 0
self.size = size
self._data = list(data)[-size:]
def append(self, value):
"""Append an element"""
if len(self._data) == self.size:
self._data[self.index] = value
else:
self._data.append(value)
self.index = (self.index + 1) % self.size
def __getitem__(self, key):
"""Get element by index, relative to the current index"""
if len(self._data) == self.size:
return(self._data[(key + self.index) % self.size])
else:
return(self._data[key])
def __repr__(self):
"""Return string representation"""
return (self._data[self.index:] + self._data[:self.index]).__repr__() + ' (' + str(len(self._data))+'/{} items)'.format(self.size)
c[-1]
始终是正确的元素。 __getitem__
做得很好。 - Basjc=circularlist(5, [1,2])
,我不能执行 c[2]
,但是一旦列表已经满了,就可以执行 c[5]
。其次,c=circularlist(5); c.append(1)
应该产生与 c=circularlist(5, [1])
相同的列表,但实际上并不是这样。在前一种情况下,c.index
是 1
,而在后一种情况下,它是 0
。 - Joost Kremers虽然deque类的使用没问题,但考虑到问题的要求(平均值),这是我的解决方案:
>>> from collections import deque
>>> class CircularBuffer(deque):
... def __init__(self, size=0):
... super(CircularBuffer, self).__init__(maxlen=size)
... @property
... def average(self): # TODO: Make type check for integer or floats
... return sum(self)/len(self)
...
>>>
>>> cb = CircularBuffer(size=10)
>>> for i in range(20):
... cb.append(i)
... print "@%s, Average: %s" % (cb, cb.average)
...
@deque([0], maxlen=10), Average: 0
@deque([0, 1], maxlen=10), Average: 0
@deque([0, 1, 2], maxlen=10), Average: 1
@deque([0, 1, 2, 3], maxlen=10), Average: 1
@deque([0, 1, 2, 3, 4], maxlen=10), Average: 2
@deque([0, 1, 2, 3, 4, 5], maxlen=10), Average: 2
@deque([0, 1, 2, 3, 4, 5, 6], maxlen=10), Average: 3
@deque([0, 1, 2, 3, 4, 5, 6, 7], maxlen=10), Average: 3
@deque([0, 1, 2, 3, 4, 5, 6, 7, 8], maxlen=10), Average: 4
@deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10), Average: 4
@deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], maxlen=10), Average: 5
@deque([2, 3, 4, 5, 6, 7, 8, 9, 10, 11], maxlen=10), Average: 6
@deque([3, 4, 5, 6, 7, 8, 9, 10, 11, 12], maxlen=10), Average: 7
@deque([4, 5, 6, 7, 8, 9, 10, 11, 12, 13], maxlen=10), Average: 8
@deque([5, 6, 7, 8, 9, 10, 11, 12, 13, 14], maxlen=10), Average: 9
@deque([6, 7, 8, 9, 10, 11, 12, 13, 14, 15], maxlen=10), Average: 10
@deque([7, 8, 9, 10, 11, 12, 13, 14, 15, 16], maxlen=10), Average: 11
@deque([8, 9, 10, 11, 12, 13, 14, 15, 16, 17], maxlen=10), Average: 12
@deque([9, 10, 11, 12, 13, 14, 15, 16, 17, 18], maxlen=10), Average: 13
@deque([10, 11, 12, 13, 14, 15, 16, 17, 18, 19], maxlen=10), Average: 14
collections
是标准库的一部分,而numpy
则不是。依赖第三方库将使标准库变得糟糕。 - user719958numpy.roll
返回的是数组的副本,对吗? - djvgempty_like
),然后将交换后的两个部分复制到新数组中。由于每次都要复制整个数组,这是 O(n) 的时间复杂度。 - Bas Swinckels那么Python Cookbook中的解决方案如何?当环形缓冲实例变满时,可以重新分类。
class RingBuffer:
""" class that implements a not-yet-full buffer """
def __init__(self,size_max):
self.max = size_max
self.data = []
class __Full:
""" class that implements a full buffer """
def append(self, x):
""" Append an element overwriting the oldest one. """
self.data[self.cur] = x
self.cur = (self.cur+1) % self.max
def get(self):
""" return list of elements in correct order """
return self.data[self.cur:]+self.data[:self.cur]
def append(self,x):
"""append an element at the end of the buffer"""
self.data.append(x)
if len(self.data) == self.max:
self.cur = 0
# Permanently change self's class from non-full to full
self.__class__ = self.__Full
def get(self):
""" Return a list of elements from the oldest to the newest. """
return self.data
# sample usage
if __name__=='__main__':
x=RingBuffer(5)
x.append(1); x.append(2); x.append(3); x.append(4)
print(x.__class__, x.get())
x.append(5)
print(x.__class__, x.get())
x.append(6)
print(x.data, x.get())
x.append(7); x.append(8); x.append(9); x.append(10)
print(x.data, x.get())
self.__class__
来模拟它。只要两个类具有相同的slots(例如,在这个示例中,对于两个经典类RingBuffer和__Full
而言,它可以正常工作),即使在Python 2.2中也可以运行。出处:Sébastien Keim
#!/usr/bin/env python
import numpy as np
class RingBuffer(object):
def __init__(self, size_max, default_value=0.0, dtype=float):
"""initialization"""
self.size_max = size_max
self._data = np.empty(size_max, dtype=dtype)
self._data.fill(default_value)
self.size = 0
def append(self, value):
"""append an element"""
self._data = np.roll(self._data, 1)
self._data[0] = value
self.size += 1
if self.size == self.size_max:
self.__class__ = RingBufferFull
def get_all(self):
"""return a list of elements from the oldest to the newest"""
return(self._data)
def get_partial(self):
return(self.get_all()[0:self.size])
def __getitem__(self, key):
"""get element"""
return(self._data[key])
def __repr__(self):
"""return string representation"""
s = self._data.__repr__()
s = s + '\t' + str(self.size)
s = s + '\t' + self.get_all()[::-1].__repr__()
s = s + '\t' + self.get_partial()[::-1].__repr__()
return(s)
class RingBufferFull(RingBuffer):
def append(self, value):
"""append an element when buffer is full"""
self._data = np.roll(self._data, 1)
self._data[0] = value
来自Github:
class CircularBuffer:
def __init__(self, size):
"""Store buffer in given storage."""
self.buffer = [None]*size
self.low = 0
self.high = 0
self.size = size
self.count = 0
def isEmpty(self):
"""Determines if buffer is empty."""
return self.count == 0
def isFull(self):
"""Determines if buffer is full."""
return self.count == self.size
def __len__(self):
"""Returns number of elements in buffer."""
return self.count
def add(self, value):
"""Adds value to buffer, overwrite as needed."""
if self.isFull():
self.low = (self.low+1) % self.size
else:
self.count += 1
self.buffer[self.high] = value
self.high = (self.high + 1) % self.size
def remove(self):
"""Removes oldest value from non-empty buffer."""
if self.count == 0:
raise Exception ("Circular Buffer is empty");
value = self.buffer[self.low]
self.low = (self.low + 1) % self.size
self.count -= 1
return value
def __iter__(self):
"""Return elements in the circular buffer in order using iterator."""
idx = self.low
num = self.count
while num > 0:
yield self.buffer[idx]
idx = (idx + 1) % self.size
num -= 1
def __repr__(self):
"""String representation of circular buffer."""
if self.isEmpty():
return 'cb:[]'
return 'cb:[' + ','.join(map(str,self)) + ']'
https://github.com/heineman/python-data-structures/blob/master/2.%20Ubiquitous%20Lists/circBuffer.py