NumPy拼接和合并1D数组

Question

NumPy拼接和合并1D数组

7

我需要拼接数组，但如果它们重叠，还需要将A的末尾与B的开头合并。

[1, 2, 4] + [2, 4, 5] -> [1, 2, 4, 5]
[1, 2, 4] + [2, 5, 4] -> [1, 2, 4, 2, 5, 4]
[1, 2, 4] + [1, 2, 4, 5] -> [1, 2, 4, 5]

注意：元素的顺序必须保留，[4, 5]与[5, 4]不相同。

注意2：问题也可以理解为：我们需要A的最短可能扩展，使其输出以B结尾。

当然，我可以遍历第二个数组并逐个比较元素，但我正在寻找一个漂亮的Numpy解决方案。

- Vidak

1

@Artog 是的，顺序很重要 :) 我更新了问题。 - Vidak

1

你的元素是否可以正确比较？例如，它们总是整数吗？ - 9769953

2

基本上，您想要合并“重叠”的部分：第一个数组的末尾和第二个数组的开头。是这样吗？ - 9769953

1

似乎我误解了，所以我删除了我的答案。我会看看能否想出一个好的解决方案。 - Artog

1

[1, 2, 4, 2]和[2, 4, 5]合并后是否应该返回[1, 2, 4, 2, 4, 5]？最终结果中仍然存在重复的序列，但是这两个数组只在它们的结尾和开头匹配一个2。 - 9769953

显示剩余7条评论

4个回答

1

以下是使用NumPy的解决方案。这并不理想，因为它需要进行（可能不必要的）排序和迭代。排序和迭代都应该在相对较小的数组上进行（甚至可以是单个元素）。

import numpy as np

def merge(left, right):
    """Concatenating two arrays, merging the overlapping end and start of
    the left and right array"""

    # We can limit the search to the maximum possible overlap between
    # the arrays, which is the minimum of the two lengths
    l = min(len(left), len(right))

    # Find all indices in `right` where the element matches the last element of `left`.
    # Need to sort, since the `nonzero` documentation doesn't
    # explicitly state whether the returned indices follow the order
    # as in `right`
    # As long as there are few matches, sorting will not be a showstopper
    # Need to reverse the sorted array, to start from the back of the
    # right array, work towards the front, until there is a proper match
    for i in np.sort(np.nonzero(right[:l] == left[-1])[0])[::-1]:
        # Check if the subarrays are equal
        if np.all(left[-i-1:] == right[:i+1]):
            return np.concatenate([left, right[i+1:]])
    # No match
    return np.concatenate([left, right])


a = np.array([1, 2, 4])
b = np.array([2, 4, 5])
c = np.array([2, 5, 4])
d = np.array([1, 2, 4, 5])
e = np.array([1, 2, 4, 2])
f = np.array([2, 4, 2, 5])

print(merge(a, b))
print(merge(a, c))
print(merge(a, d))
print(merge(e, b))
print(merge(e, f))

这段文本是HTML标记，表示“产生”。

[1 2 4 5]
[1 2 4 2 5 4]
[1 2 4 5]
[1 2 4 2 4 5]
[1 2 4 2 5]

- 9769953

不错的方法，但似乎有点过了:D我还希望我们能想出一些花哨的索引/卷积/相关矢量化解决方案。我已经添加了一个没有numpy的答案（应该是O(n)）。 - Vidak

1

@Vidak 我也会比较答案中给出的各种解决方案，针对你列表的总大小，然后看哪个是最快的。NumPy 在具有大重叠的大数组方面可能会胜出，但也许这对你的情况不相关。 - 9769953

1

我有一个 O(n) 的解决方案，尽管没有使用Numpy：

def merge(a, b):
    n_a = len(a)
    n = min(n_a, len(b))
    m = 0
    for i in range(1, n + 1):
        if b[n - i] == a[n_a - 1 - m]:
            m += 1
        else:
            m = 0
    return a + b[m:]

- Vidak

0

你可以这样做。

def concatenate(a,b):
    ret = a.copy()
    for element in b:
        if not element in ret:
            ret.append(element)
    return ret

这将保持 a + b 的顺序。

- JakobVinkas

这个在第二个例子中失败了 :) - Vidak

你的问题似乎有点奇怪，匹配子列表的长度必须是多少才不会被复制到最终列表中？ - JakobVinkas

需要检查长度为1到len(b)的所有子数组，例如[1,2,4] + [4] = [1,2,4]，以及[1,2,4]+[2,4]=[1,2,4]，最后是[1,2,4]+[1,2,4]=[1,2,4]。 - Vidak

但是你的第二个例子并没有遵循这个规则，就像[1,2,4] + [4] = [1,2,4]，[1,2,4] + [2] = [1,2,4]，然后[1,2,4] + [2,5,4] = [1,2,4,5]。或者我漏掉了什么？ - JakobVinkas

元素的顺序很重要，因此在您的示例中，正确的输出应为[1,2,4,2,5,4]。 - Vidak

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Artog · Accepted Answer

最初我误解了问题。根据我的理解，问题是这样的：

Two item suffix of A matches 2 item prefix of B:
[1, 2, 4] +
   [2, 4, 5] =>
[1, 2, 4, 5]

No suffix of A matches a prefix of B:
[1, 2, 4] + 
         [2, 5, 4] -> 
[1, 2, 4, 2, 5, 4]

然后我们可以使用这个效率极低的函数：

最初的回答

def merge(A,B):
    i = 0
    m = 0
    # Find largest suffix of A that matches the prefix of B with the same length
    while i <= len(A):
        if A[-i:] == B[:i] and i > m:
            m = i
        i += 1
    return A + B[m:]