对每相邻两个元素求平均值,并将结果插入回数组中。

6

我有一个数组,想要在两个数字之间找到平均值,并在这两个数字之间添加一个额外的元素。例如,如果我从以下数组开始:

x = np.array([1, 3, 5, 7, 9])

我希望能最终实现

[1, 2, 3, 4, 5, 6, 7, 8, 9]

我该如何开始做这件事?


numpy.interp 在这里有帮助吗? - some_name.py
Python列表有一个内置的*insert()*函数,非常容易使用。Numpy也有一个插入函数,但它稍微复杂一些。https://numpy.org/doc/stable/reference/generated/numpy.insert.html - user2668284
6个回答

6

以下是一个使用numpy.repeat方法的简单而高效的方式:

x = np.array([1, 3, 5, 7, 9])

xx = x.repeat(2)
(xx[1:]+xx[:-1]) / 2
# array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

# or if you want to preserve int dtype
(xx[1:]+xx[:-1]) // 2
# array([1, 2, 3, 4, 5, 6, 7, 8, 9])

一个简单的基准测试脚本:
import numpy as np
from numpy.lib import stride_tricks
from itertools import zip_longest

def Brandt(x,forceint=False):
    y = np.diff(x)/2 + x[:-1]
    z = [n for pair in zip_longest(x,y) for n in pair if n]
    return np.asarray(z, int) if forceint else np.asarray(z)

def Ch3steR(x):
    strd = x.strides[0]
    vals = stride_tricks.as_strided(x, shape=(len(x) - 1, 2),
                                    strides=(strd, strd))
    means = vals.mean(axis=1)
    return np.insert(x, np.arange(1, len(x)), means)

def moving_average(x, w):
    return np.convolve(x, np.ones(w), 'valid') / w
def Tankred(x):
    return np.insert(x, np.arange(1, len(x)), moving_average(x, 2))

def fskj(x):
    avg = (x[:-1] + x[1:]) / 2
    zipped  = np.stack((x[:-1], avg), -1)
    flattened = zipped.flatten()
    return np.append(flattened, x[-1])

def user1740577(x):
    for i in np.arange(0,len(x)+2,2):
        x = np.insert(x,i+1,np.average(x[i:i+2]))    
    return x
        
def loopywalt(x,forceint=False):
    xx = x.repeat(2)
    return (xx[:-1]+xx[1:]) // 2 if forceint else (xx[:-1]+xx[1:]) / 2

all_ = (Brandt,Ch3steR,Tankred,fskj,user1740577,loopywalt)
blacklist=[]
from timeit import timeit
rng = np.random.default_rng(seed=1)
for ex in [np.array([1,3,5,7,9]),rng.integers(1,1000,1000),
           rng.integers(1,1000,1000000)]:
    print();print("n =",len(ex))
    for method in all_:
        if method in blacklist:
            continue
        t = timeit(lambda:method(ex),number=10)
        if t<0.1:
            t = timeit(lambda:method(ex),number=1000)
        else:
            blacklist.append(method)
            t *= 100
        print(method.__name__,t,'ms')

结果:

n = 5
Brandt 0.018790690000969335 ms
Ch3steR 0.06143478500052879 ms
Tankred 0.039249178998943535 ms
fskj 0.026057840999783366 ms
user1740577 0.15504688399960287 ms
loopywalt 0.0033979790005105315 ms

n = 1000
Brandt 0.4772341360003338 ms
Ch3steR 0.10018322700125282 ms
Tankred 0.0674891500002559 ms
fskj 0.03475799899933918 ms
user1740577 17.72124929993879 ms
loopywalt 0.017431922000469058 ms

n = 1000000
Brandt 491.9887762000144 ms
Ch3steR 56.97805079998943 ms
Tankred 44.63849610001489 ms
fskj 25.709937600004196 ms
loopywalt 20.622111500051687 ms

3
你可以使用numpy.insert和移动平均线填充缺失的值:
import numpy as np

x = np.array([1, 3, 5, 7, 9])

# copied from: https://dev59.com/IWYq5IYBdhLWcg3wpyNE#54628145
def moving_average(x, w):
    return np.convolve(x, np.ones(w), 'valid') / w

x_filled = np.insert(x, np.arange(1, len(x)), moving_average(x, 2))

x_filled: 数组([1, 2, 3, 4, 5, 6, 7, 8, 9])


1

另一种快速的版本是将原始数据和平均值切片成一个新数组:

def Kelly(x):
    avg = (x[1:] + x[:-1]) / 2
    res = np.empty(x.size + avg.size)
    res[::2] = x
    res[1::2] = avg
    return res

使用loopy walt的基准测试(必须删除default_rng并使用randint才能运行):

n = 5
Brandt 0.023532982973847538 ms
Ch3steR 0.05084541701944545 ms
Tankred 0.029509164043702185 ms
fskj 0.01449447899358347 ms
user1740577 0.11903033603448421 ms
loopywalt 0.002962342055980116 ms
Kelly 0.004625919042155147 ms

n = 1000
Brandt 0.415388774999883 ms
Ch3steR 0.11717381200287491 ms
Tankred 0.07865125295938924 ms
fskj 0.026592836948111653 ms
user1740577 15.592256403760985 ms
loopywalt 0.02348607504973188 ms
Kelly 0.009647938015405089 ms

n = 1000000
Brandt 531.4903213002253 ms
Ch3steR 139.16819099686109 ms
Tankred 125.81092769978568 ms
fskj 63.73856549616903 ms
loopywalt 55.087829200783744 ms
Kelly 14.159472199389711 ms

1
你可以利用 numpy.lib.stride_tricks.as_strided1 找到每两个值的平均值。然后使用 np.insert 将这些值插入到数组中。
from numpy.lib import stride_tricks
x = np.array([1, 3, 5, 7, 9])
strd = x.strides[0]
vals = stride_tricks.as_strided(x, shape=(len(x) - 1, 2), strides=(strd, strd))

# print(vals)
# [[1 3]
#  [3 5]
#  [5 7]
#  [7 9]]

means = vals.mean(axis=1)
print(means)
# [2. 4. 6. 8.]

np.insert(x, np.arange(1, len(x)), means)
# array([1, 2, 3, 4, 5, 6, 7, 8, 9])

1. 关于strides的更多细节,请参考如何让小白理解NumPy的stridesRick M.的这篇文章


1
简单来说:
import numpy as np
x = np.array([1, 3, 5, 7, 9])

# Use `itertools.zip_longest` to wrap averages and inputs together
from itertools import zip_longest

# compute the averages
y = np.diff(x)/2 + x[:-1]

# mix them (order, in this case)
z = [n for pair in zip_longest(x,y) for n in pair if n]

# make it a numpy-array (of ints)
np.asarray(z, int)
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

1
我不确定像那样混合使用 numpy 和标准的 python 是否符合 Pythonic 的风格。这种方式肯定会很慢。 - loopy walt
@loopywalt 哈哈,你的解决方案确实很好,我不知道 numpy.repeat。还有感谢你的比较,很酷......我知道这样的列表推导式展开会影响性能...干杯 - Brandt

0
import numpy as np
x = np.array([1, 3, 5, 7, 9])
avg = (x[:-1] + x[1:]) / 2 # calculate average value of all consecutive pairs: [2, 4, 6, 8]
zipped  = np.stack((x[:-1], avg), -1) # zip x and avg, except for last element in x: [[1, 2], [3, 4], [5, 6], [7, 8]]
flattened = zipped.flatten() # Flatten to form 1-d array: [1, 2, 3, 4, 5, 6, 7, 8]
requested_result = np.append(flattened, x[-1]) # Add last element of x: [1, 2, 3, 4, 5, 6, 7, 8, 9]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接