获取一个numpy数组的所有子序列

Question

获取一个numpy数组的所有子序列

4

给定一个大小为n的numpy数组和一个整数m，我想生成该数组的所有连续m长度子序列，最好以二维数组形式表示。

示例：

>>> subsequences(arange(10), 4)

array([[0, 1, 2, 3, 4, 5, 6],
       [1, 2, 3, 4, 5, 6, 7],
       [2, 3, 4, 5, 6, 7, 8],
       [3, 4, 5, 6, 7, 8, 9]])

我能想到的最好方法是这样做：

def subsequences(arr, m):
    n = arr.size
    # Create array of indices, essentially solution for "arange" input
    indices = cumsum(vstack((arange(n - m + 1), ones((m-1, n - m + 1), int))), 0)
    return arr[indices]

是否有更好的，最好是内置的函数我错过了吗？

- Erik

在问题中，您声明要求m长度的子序列，但在示例中，m是子序列的数量，而不是它们的长度。 - logc

@logc 我希望列是长度为m的子序列，即查看转置 - Erik

可能是[Efficient Numpy 2D array construction from 1D array]的重复，该链接为：https://dev59.com/i2445IYBdhLWcg3wUYrM - user2379410

4个回答

5

这里有一种非常快速和内存有效的方法，只是对原始数组的“视图”：

from numpy.lib.stride_tricks import as_strided

def subsequences(arr, m):
    n = arr.size - m + 1
    s = arr.itemsize
    return as_strided(arr, shape=(m,n), strides=(s,s))

如果您需要写入这个数组，您应该首先使用np.copy，否则会修改原始数组以及“子序列”数组中相应的条目。

更多信息请参见：https://dev59.com/i2445IYBdhLWcg3wUYrM#4924433

- user2379410

4

你走在正确的道路上。

你可以利用以下广播技巧，从两个一维的arange中创建一个二维的indices数组：

arr = arange(7)[::-1]
arr
=> array([6, 5, 4, 3, 2, 1, 0])
n = arr.size
m = 3

indices = arange(m) + arange(n-m+1).reshape(-1, 1)  # broadcasting rulez
indices
=>
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

arr[indices]
=>
array([[6, 5, 4],
       [5, 4, 3],
       [4, 3, 2],
       [3, 2, 1],
       [2, 1, 0]])

- shx2

2

这绝对比我之前的做法好，但使用内置的scipy工具似乎是最佳选择。 - Erik

0

基于迭代器

from itertools import tee, islice
import collections
import numpy as np

# adapted from https://docs.python.org/2/library/itertools.html
def consumed(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)
    return iterator


def subsequences(iterable, b):
    return np.array([list(consumed(it, i))[:b] for i, it in enumerate(tee(iterable, len(iterable) - b + 1))]).T

print subsequences(np.arange(10), 4)

基于切片

import numpy as np

def subsequences(iterable, b):
    return np.array([iterable[i:i + b] for i in range(len(iterable) - b + 1)]).T

print subsequences(np.arange(10), 4)

- Ruggero Turra

2

我尝试了基于切片的方法，但它似乎表现不如基于索引的方法。总的来说，我认为在numpy数组之间进行转换比仅对numpy数据类型进行操作更加昂贵。不过还是感谢您的建议！ - Erik

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- chthonicdaemon · Accepted Answer

scipy.linalg.hankel 可以实现此功能。

from scipy.linalg import hankel
def subsequences(v, m):
    return hankel(v[:m], v[m-1:])