遍历numpy数组然后在另一个数组中索引值

3

我很难让这段代码正常工作。我想要遍历一个numpy数组,根据结果在另一个numpy数组中找到相应的值,并将该值存储在基于该值的新位置。

     # Convert the sediment transport and the flow direction rasters into Numpy arrays
    sediment_transport_np = arcpy.RasterToNumPyArray(sediment_transport_convert, '#', '#', '#', -9999)
    flow_direction_np = arcpy.RasterToNumPyArray(flow_direction_convert, '#', '#', '#', -9999)

    [rows,cols]= sediment_transport_np.shape
    elevation_change = np.zeros((rows,cols), np.float)

   # Main body for calculating elevation change        

    # Attempt 1
    for [i, j], flow in np.ndenumerate(flow_direction_np):
        if flow == 32:
            elevation_change[i, j]  = sediment_transport_np[i - 1, j - 1]
        elif flow == 16:
            elevation_change[i, j] = sediment_transport_np[i, j - 1]
        elif flow == 8:
            elevation_change[i, j] = sediment_transport_np[i + 1, j - 1]
        elif flow == 4:
            elevation_change[i, j] = sediment_transport_np[i + 1, j]
        elif flow == 64:
            elevation_change[i, j] = sediment_transport_np[i - 1, j]
        elif flow == 128:
            elevation_change[i, j] = sediment_transport_np[i - 1, j + 1]
        elif flow == 1:
            elevation_change[i, j] = sediment_transport_np[i, j + 1]
        elif flow == 2:
            elevation_change[i, j] = sediment_transport_np[i + 1, j + 1]

将Numpy数组转换回栅格

    elevation_change_raster = arcpy.NumPyArrayToRaster(elevation_change, bottom_left_corner, raster_cell_width, raster_cell_height, -9999)

    elevation_change_raster.save(output_raster)

我收到的错误信息如下:
运行脚本elevation_change...
Traceback (most recent call last): File "", line 606, in execute IndexError: 索引(655)超出范围(0<=index<655),在维度0中
未能执行(elevation_change)

2
你之所以出现错误,是因为你试图超出 sediment_transport 网格的边界(例如 i+1j+1 部分)。 当你处于网格边界时,你正在尝试获取一个不存在的值。 你想在边界发生什么? 你需要以某种方式实现边界条件。 此外,它没有引发错误,但当你在 i=0j=0 时,你当前正在获取 相反 的边缘(由于 i-1j-1 部分)。 - Joe Kington
理论上,这个脚本应该只能在 sediment_transport 格子内选择,如果可能的话,我可以把边界设为边缘,即如果它试图选择不存在的边界,则无法选择,而是返回值0。 - Nick Jones
请参考以下流向:http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//009z00000052000000.htm。流向中的每个值都对应一个方向 - 左、右、上、下等等... 在边缘处,它们似乎总是指向内部的单元格!- 编辑 - 我发现在寻找沉积物传输层外的值时出现了单元格,因此我需要设置边界条件 - 任何提示将不胜感激。 - Nick Jones
1个回答

6

问题原因

这个错误是因为您试图超出sediment_transport网格的边界(例如i+1和j+1部分)进行索引。目前,当您位于网格边界时,您正在尝试获取一个不存在的值。此外,当i=0或j=0时,您当前正在获取相反的边缘(由于i-1和j-1部分),尽管它没有引发错误。

您提到希望在边界处将elevation_change的值设为0(这似乎是合理的)。另一个常见的边界条件是“包裹”值并从相反的边缘获取值。在这种情况下可能没有太多意义,但我将在几个示例中展示它,因为使用一些方法可以很容易地实现它。

诱人但不正确的方法

只需捕获异常并将值设置为0是很诱人的。例如:

for [i, j], flow in np.ndenumerate(flow_direction_np):
    try:
        if flow == 32:
            ...
        elif ...
            ...
    except IndexError:
        elevation_change[i, j] = 0

然而,这种方法实际上是错误的。负索引是有效的,并将返回网格的相反边缘。因此,这基本上实现了网格右边缘和底部的“零”边界条件,以及左侧和顶部的“环绕”边界条件。
使用零填充
在“零”边界条件的情况下,有一种非常简单的方法可以避免索引问题:用零填充sediment_transport网格。这样,如果我们超出原始网格的边缘进行索引,我们将获得一个0。(或者您想用来填充数组的任何恒定值。)
副笔记:这是使用numpy.pad的完美场所。然而,它是在v1.7中添加的。我将跳过在此处使用它,因为OP提到了ArcGIS,而Arc不随附最新版本的numpy。
例如:
padded_transport = np.zeros((rows + 2, cols + 2), float)
padded_transport[1:-1, 1:-1] = sediment_transport  
# The two lines above could be replaced with:
#padded_transport = np.pad(sediment_transport, 1, mode='constant')

for [i, j], flow in np.ndenumerate(flow_direction):
    # Need to take into account the offset in the "padded_transport"
    r, c = i + 1, j + 1

    if flow == 32:
        elevation_change[i, j] = padded_transport[r - 1, c - 1]
    elif flow == 16:
        elevation_change[i, j] = padded_transport[r, c - 1]
    elif flow == 8:
        elevation_change[i, j] = padded_transport[r + 1, c - 1]
    elif flow == 4:
        elevation_change[i, j] = padded_transport[r + 1, c]
    elif flow == 64:
        elevation_change[i, j] = padded_transport[r - 1, c]
    elif flow == 128:
        elevation_change[i, j] = padded_transport[r - 1, c + 1]
    elif flow == 1:
        elevation_change[i, j] = padded_transport[r, c + 1]
    elif flow == 2:
        elevation_change[i, j] = padded_transport[r + 1, c + 1]

DRY (不要重复自己)

我们可以通过使用 dict 更紧凑地编写此代码:

elevation_change = np.zeros_like(sediment_transport)
nrows, ncols = flow_direction.shape
lookup = {32: (-1, -1),
          16:  (0, -1), 
          8:   (1, -1),
          4:   (1,  0),
          64: (-1,  0),
          128:(-1,  1),
          1:   (0,  1),
          2:   (1,  1)}

padded_transport = np.zeros((nrows + 2, ncols + 2), float)
padded_transport[1:-1, 1:-1] = sediment_transport  

for [i, j], flow in np.ndenumerate(flow_direction):
    # Need to take into account the offset in the "padded_transport"
    r, c = i + 1, j + 1
    # This also allows for flow_direction values not listed above...
    dr, dc = lookup.get(flow, (0,0))
    elevation_change[i,j] = padded_transport[r + dr, c + dc]

现在,给原始数组填充有点多余。如果您使用numpy.pad,通过填充实现不同的边界条件非常容易,但我们也可以直接编写逻辑:

elevation_change = np.zeros_like(sediment_transport)
nrows, ncols = flow_direction.shape
lookup = {32: (-1, -1),
          16:  (0, -1), 
          8:   (1, -1),
          4:   (1,  0),
          64: (-1,  0),
          128:(-1,  1),
          1:   (0,  1),
          2:   (1,  1)}

for [i, j], flow in np.ndenumerate(flow_direction):
    dr, dc = lookup.get(flow, (0,0))
    r, c = i + dr, j + dc
    if not ((0 <= r < nrows) & (0 <= c < ncols)):
        elevation_change[i,j] = 0
    else:
        elevation_change[i,j] = sediment_transport[r, c]

向量化计算

在Python中遍历NumPy数组相当缓慢,原因我不会在这里深入探讨。因此,在NumPy中有更有效的实现方法。诀窍是使用numpy.roll和布尔索引。

对于“环绕”边界条件,只需:

elevation_change = np.zeros_like(sediment_transport)
nrows, ncols = flow_direction.shape
lookup = {32: (-1, -1),
          16:  (0, -1), 
          8:   (1, -1),
          4:   (1,  0),
          64: (-1,  0),
          128:(-1,  1),
          1:   (0,  1),
          2:   (1,  1)}

for value, (row, col) in lookup.iteritems():
    mask = flow_direction == value
    shifted = np.roll(mask, row, 0)
    shifted = np.roll(shifted, col, 1)
    elevation_change[mask] = sediment_transport[shifted]

return elevation_change

如果您不熟悉numpy,这可能看起来有点像希腊语。这里有两个部分。第一个是使用布尔索引。以下是一个快速示例:

In [1]: import numpy as np

In [2]: x = np.arange(9).reshape(3,3)

In [3]: x
Out[3]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [4]: mask = np.array([[False, False, True],
...                      [True, False, False],
...                      [True, False, False]])


In [5]: x[mask]
Out[5]: array([2, 3, 6])

正如您所看到的,如果我们使用与布尔网格形状相同的数组进行索引,则将返回为True的值。同样,您也可以通过这种方式设置值。
下一个技巧是 "numpy.roll"。这将在一个方向上将值移动给定数量。它们会在边缘处“环绕”。
In [1]: import numpy as np

In [2]: np.array([[0,0,0],[0,1,0],[0,0,0]])
Out[2]: 
array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])

In [3]: x = _

In [4]: np.roll(x, 1, axis=0)
Out[4]: 
array([[0, 0, 0],
       [0, 0, 0],
       [0, 1, 0]])

In [5]: np.roll(x, 1, axis=1)
Out[5]: 
array([[0, 0, 0],
       [0, 0, 1],
       [0, 0, 0]])

希望这有点意义。为了实现“零”边界条件(或使用numpy.pad实现任意边界条件),我们需要像这样做:
def vectorized(flow_direction, sediment_transport):
    elevation_change = np.zeros_like(sediment_transport)
    nrows, ncols = flow_direction.shape
    lookup = {32: (-1, -1),
              16:  (0, -1), 
              8:   (1, -1),
              4:   (1,  0),
              64: (-1,  0),
              128:(-1,  1),
              1:   (0,  1),
              2:   (1,  1)}

    # Initialize an array for the "shifted" mask
    shifted = np.zeros((nrows+2, ncols+2), dtype=bool)

    # Pad "sediment_transport" with zeros
    # Again, `np.pad` would be better and more flexible here, as it would
    # easily allow lots of different boundary conditions...
    tmp = np.zeros((nrows+2, ncols+2), sediment_transport.dtype)
    tmp[1:-1, 1:-1] = sediment_transport
    sediment_transport = tmp

    for value, (row, col) in lookup.iteritems():
        mask = flow_direction == value

        # Reset the "shifted" mask
        shifted.fill(False)
        shifted[1:-1, 1:-1] = mask

        # Shift the mask by the right amount for the given value
        shifted = np.roll(shifted, row, 0)
        shifted = np.roll(shifted, col, 1)

        # Set the values in elevation change to the offset value in sed_trans
        elevation_change[mask] = sediment_transport[shifted]

    return elevation_change

矢量化的优势

“矢量化”的版本速度更快,但会使用更多的内存。

对于一个1000x1000的网格:

In [79]: %timeit vectorized(flow_direction, sediment_transport)
10 loops, best of 3: 170 ms per loop

In [80]: %timeit iterate(flow_direction, sediment_transport)
1 loops, best of 3: 5.36 s per loop

In [81]: %timeit lookup(flow_direction, sediment_transport)
1 loops, best of 3: 3.4 s per loop

这些结果是通过与随机生成的数据进行比较来得出的以下实现:
import numpy as np

def main():
    # Generate some random flow_direction and sediment_transport data...
    nrows, ncols = 1000, 1000
    flow_direction = 2 ** np.random.randint(0, 8, (nrows, ncols))
    sediment_transport = np.random.random((nrows, ncols))

    # Make sure all of the results return the same thing...
    test1 = vectorized(flow_direction, sediment_transport)
    test2 = iterate(flow_direction, sediment_transport)
    test3 = lookup(flow_direction, sediment_transport)
    assert np.allclose(test1, test2)
    assert np.allclose(test2, test3)


def vectorized(flow_direction, sediment_transport):
    elevation_change = np.zeros_like(sediment_transport)
    sediment_transport = np.pad(sediment_transport, 1, mode='constant')
    lookup = {32: (-1, -1),
              16:  (0, -1), 
              8:   (1, -1),
              4:   (1,  0),
              64: (-1,  0),
              128:(-1,  1),
              1:   (0,  1),
              2:   (1,  1)}

    for value, (row, col) in lookup.iteritems():
        mask = flow_direction == value
        shifted = np.pad(mask, 1, mode='constant')
        shifted = np.roll(shifted, row, 0)
        shifted = np.roll(shifted, col, 1)
        elevation_change[mask] = sediment_transport[shifted]

    return elevation_change

def iterate(flow_direction, sediment_transport):
    elevation_change = np.zeros_like(sediment_transport)
    padded_transport = np.pad(sediment_transport, 1, mode='constant')  

    for [i, j], flow in np.ndenumerate(flow_direction):
        r, c = i + 1, j + 1
        if flow == 32:
            elevation_change[i, j] = padded_transport[r - 1, c - 1]
        elif flow == 16:
            elevation_change[i, j] = padded_transport[r, c - 1]
        elif flow == 8:
            elevation_change[i, j] = padded_transport[r + 1, c - 1]
        elif flow == 4:
            elevation_change[i, j] = padded_transport[r + 1, c]
        elif flow == 64:
            elevation_change[i, j] = padded_transport[r - 1, c]
        elif flow == 128:
            elevation_change[i, j] = padded_transport[r - 1, c + 1]
        elif flow == 1:
            elevation_change[i, j] = padded_transport[r, c + 1]
        elif flow == 2:
            elevation_change[i, j] = padded_transport[r + 1, c + 1]
    return elevation_change

def lookup(flow_direction, sediment_transport):
    elevation_change = np.zeros_like(sediment_transport)
    nrows, ncols = flow_direction.shape
    lookup = {32: (-1, -1),
              16:  (0, -1), 
              8:   (1, -1),
              4:   (1,  0),
              64: (-1,  0),
              128:(-1,  1),
              1:   (0,  1),
              2:   (1,  1)}

    for [i, j], flow in np.ndenumerate(flow_direction):
        dr, dc = lookup.get(flow, (0,0))
        r, c = i + dr, j + dc
        if not ((0 <= r < nrows) & (0 <= c < ncols)):
            elevation_change[i,j] = 0
        else:
            elevation_change[i,j] = sediment_transport[r, c]

    return elevation_change

if __name__ == '__main__':
    main()

谢谢你的帮助,乔。我对numpy / python非常陌生,只编码了几周。我目前在这一行上遇到问题:padded_transport[1:-1, 1:-1] = sediment_transport_np
文件“<string>”,第618行,执行时出现ValueError: operands could not be broadcast together with shapes (654,1023) (655,1024),有什么想法吗?
- Nick Jones
糟糕!这是我的笔误。我应该写成 padded_transport = np.zeros((rows + 2, cols + 2), float)。这就是我写无法测试的代码的后果... - Joe Kington
太棒了Joe,谢谢!我刚从酒吧回来,还没来得及试一下,但你真的帮了我大忙 :-) 非常感谢! - Nick Jones
我添加了更多的解释(并删除了我之前发布的不正确的解决方案)。希望它能有所帮助,而且不会太令人困惑! - Joe Kington
谢谢Joe,这对我很有帮助。由于我编程的时间很短,有些地方还不太理解,但我相信随着时间的推移会慢慢明白的!目前我没有在循环中遇到任何问题。不确定你是否了解arcpy和python - http://gis.stackexchange.com/questions/63725/while-looping-with-arcpy-error-010240-could-not-save-raster-dataset 我已经在GIS stackexchange上发布了它。如果你有任何想法,将不胜感激! - Nick Jones
嗨,乔,我一直在尝试向量化,但似乎无法使其正常工作。我在这里提出了一个问题:http://stackoverflow.com/questions/17744481/changing-something-from-iterating-over-a-numpy-array-to-vectorization - Nick Jones

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接