Python在滚动窗口中操作多个数据数组

Question

Python在滚动窗口中操作多个数据数组

4

考虑以下代码：

class MyClass(object):

    def __init__(self):

        self.data_a = np.array(range(100))
        self.data_b = np.array(range(100,200))
        self.data_c = np.array(range(200,300))

    def _method_i_do_not_have_access_to(self, data, window, func):

        output = np.empty(np.size(data))

        for i in xrange(0, len(data)-window+1):
            output[i] = func(data[i:i+window])

        output[-window+1:] = np.nan

        return output

    def apply_a(self):

        a = self.data_a

        def _my_func(val):
            return sum(val)

        return self._method_i_do_not_have_access_to(a, 5, _my_func)

my_class = MyClass()
print my_class.apply_a()

_method_i_do_not_have_access_to方法接受一个numpy数组、一个窗口参数和一个用户定义的函数句柄，并返回一个包含函数句柄在输入数据数组中每次window数据点上输出值的数组——这是一种通用的滚动方法。我无法更改此方法。

正如您所看到的，_method_i_do_not_have_access_to将一个输入传递给函数句柄，该输入是传递给_method_i_do_not_have_access_to的数据数组。该函数句柄仅根据传递给它的一个数据数组上的window数据点计算输出。

我需要做的是允许_my_func（传递给_method_i_do_not_have_access_to的函数句柄）在data_b和data_c上操作，除了通过_method_i_do_not_have_access_to传递给_my_func的数组之外，在相同的window索引处操作。data_b和data_c在MyClass类中全局定义。

我想到的唯一方法是在_my_func中包含对data_b和data_c的引用，就像这样：

def _my_func(val):
    b = self.data_b
    c = self.data_c
    # do some calculations
    return sum(val)

然而，我需要在与_method_i_do_not_have_access_to传递的长度为window的数组的切片val相同的索引处对b和c进行切片。

例如，如果_method_i_do_not_have_access_to中的循环当前在输入数组的索引45 -> 50上运行，则_my_func必须在b和c上以相同的索引进行操作。

最终结果将类似于这样：

def _my_func(val):

    b = self.data_b # somehow identify which slide we are at
    c = self.data_c # somehow identify which slide we are at

    # if _method_i_do_not_have_access_to is currently
    # operating on indexes 45->50, then the sum of 
    # val, b, and c should be the sum of the values at
    # index 45->50 at each

    return sum(val) * sum(b) + sum(c)

有什么想法可以帮我完成这个任务吗？

- Jason Strimpel

4个回答

1

你可以将一个二维数组传递给我无法访问的 _method_i_do_not_have_access_to()。len() 和切片操作也可以使用：

In [29]: a = np.arange(100)
In [30]: b = np.arange(100,200)
In [31]: c = np.arange(200,300)
In [32]: data = np.c_[a,b,c] # make your three one dimension array to one two dimension array.

In [35]: data[0:10] # slice operation works.
Out[35]:
array([[  0, 100, 200],
       [  1, 101, 201],
       [  2, 102, 202],
       [  3, 103, 203],
       [  4, 104, 204],
       [  5, 105, 205],
       [  6, 106, 206],
       [  7, 107, 207],
       [  8, 108, 208],
       [  9, 109, 209]])

In [36]: len(data) # len() works.
Out[36]: 100

In [37]: data.shape
Out[37]: (100, 3)

因此，您可以将您的_my_func定义为以下内容：

def _my_func(val):
    s = np.sum(val, axis=0)
    return s[0]*s[1] + s[2]

- HYRY

这是一个好的清晰解决方案，但不幸的是_method_i_do_not_have_access_to似乎只针对输入数组的最后一维进行计算。这就是为什么我得出的结论是唯一的解决方案是将全局数组传递到my_func中并在其中进行切片。对于简洁的回答加1。 - Jason Strimpel

0

这里有一个技巧：

创建一个新类DataProxy，它具有__getitem__方法，并代理三个数据数组（可以在初始化时传递给它）。使func act onDataProxy实例而不是标准的numpy数组，并将修改后的func和proxy传递给无法访问的方法。

这样做有意义吗？这个想法是data没有约束成为一个数组，只需要是可下标的。因此，您可以创建一个自定义的可下标类来代替数组。

例子：

class DataProxy:
    def __init__(self, *data):
        self.data = list(zip(*data))

    def __getitem__(self, item):
        return self.data[item]

然后创建一个新的DataProxy，在创建时传入尽可能多的数组，并使func接受索引该实例的结果。试试吧！

- Katriel

前三次我读的时候没看懂哈 :) 你能提供一个例子吗？ - Jason Strimpel

我明天会试一下。我想我会遇到上面评论中提到的同样问题。 - Jason Strimpel

0

看起来_method_i_do_not..只是将您的函数应用于数据，您是否可以将数据精确地作为索引数组？然后，func将使用这些索引进行窗口访问data_a、data_b和data_c。可能有更快的方法，但我认为这样做会增加最少的复杂性。

换句话说，大致如下，如果需要，还可以对window进行其他处理：

def apply_a(self):

    a = self.data_a
    b = self.data_b
    c = self.data_c

    def _my_func(window):
        return sum(a[window]) * sum(b[window]) + sum(c[window])

    return self._method_i_do_not_have_access_to(window_indices, 5, _my_func)

- senderle

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Voo · Accepted Answer

问题是如何让_my_func知道在哪些索引上进行操作？如果在调用函数时事先知道索引，最简单的方法就是使用lambda：lambda val: self._my_func(self.a, self.b, index, val)，其中_my_func显然需要更改以适应额外的参数。

由于您不知道索引，因此必须编写一个包装器来记住最后访问的索引（或更好地捕获切片运算符）并将其存储在变量中供您的函数使用。

编辑：编写了一个小例子，代码风格不是特别好，但应该能给您一个想法：

class Foo():
    def __init__(self, data1, data2):
        self.data1 = data1
        self.data2 = data2
        self.key = 0      

    def getData(self):
        return Foo.Wrapper(self, self.data2)

    def getKey(self):
        return self.key

    class Wrapper():
        def __init__(self, outer, data):
            self.outer = outer
            self.data = data

        def __getitem__(self, key):
            self.outer.key = key
            return self.data[key]

if __name__ == '__main__':
    data1 = [10, 20, 30, 40]
    data2 = [100, 200, 300, 400]
    foo = Foo(data1, data2)
    wrapped_data2 = foo.getData()
    print(wrapped_data2[2:4])
    print(data1[foo.getKey()])