最Pythonic的方法来交错两个字符串

Question

最Pythonic的方法来交错两个字符串

117

将两个字符串拼接起来，最符合 Python 风格的方法是什么？

例如：

输入：

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijklmnopqrstuvwxyz'

输出：

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

- Brandon Deo

2

这里的答案大多假设您的两个输入字符串长度相同。这是一个安全的假设吗？还是您需要处理不同长度的情况？ - SuperBiasedMan

@SuperBiasedMan 如果你有解决方案，了解如何处理所有情况可能会有所帮助。这与问题相关，但不是我的特定情况。 - Brandon Deo

3

@drexx 最佳回答者已经评论了一个解决方案，所以我只是将它编辑到他们的帖子中，使其更全面。 - SuperBiasedMan

14个回答

66

更快的替代方案

另一种方法：

res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))

输出：

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

速度

看起来速度更快了:

%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)

100000 loops, best of 3: 4.75 µs per loop

比迄今为止最快的解决方案更快：

%timeit "".join(list(chain.from_iterable(zip(u, l))))

100000 loops, best of 3: 6.52 µs per loop

对于更长的字符串：

l1 = 'A' * 1000000; l2 = 'a' * 1000000

%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop


%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)

10 loops, best of 3: 92 ms per loop

Python 3.5.1。

字符串长度不同的变异

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'

较短的一个决定长度（等同于 `zip()` 函数）

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))

输出：

AaBbCcDdEeFfGgHhIiJjKkLl

长迭代器决定长度（等同于`itertools.zip_longest(fillvalue='')`）

min_len = min(len(u), len(l))
res = [''] * min_len * 2 
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))

输出：

AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ

- Mike Müller

这个程序会创建一个列表 [''] * len(u)，然后丢弃它。最好改为 [''] * (len(u) * 2)。 - Kelly Bundy

在我的测试中，使解决方案~10%更快。 - Kelly Bundy

49

使用join()和zip()函数。

>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

- TigerhawkT3

17

将u和l两个列表中相应位置的元素进行交替排列，并将结果合并成一个字符串，可以使用以下代码实现：''.join(itertools.chain.from_iterable(zip(u, l)))。 - Blender

1

如果一个列表比另一个短，这将截断该列表，因为zip会在较短的列表被完全迭代后停止。 - SuperBiasedMan

5

没问题。如果出现问题，可以使用itertools.zip_longest。 - TigerhawkT3

19

在Python 2中，远远比使用列表切片快的方式是使用 字符串切片。对于小字符串而言速度约为列表切片的3倍，而对于长字符串则约为30倍。

res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)

然而，这种方法在Python 3上无法使用。不过，你可以实现类似于

res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")

但是如果你处理的是短字符串，通过切片来完成这个任务已经不如直接使用循环了（对于长字符串而言，直接使用循环速度慢20倍）。并且这种方法还无法处理非ASCII字符。

顺便说一句，如果你需要处理海量字符串，并且每一个周期都至关重要，并且不得不使用Python字符串...以下是实现方法：

res = bytearray(len(u) * 4 * 2)

u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]

l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]

res.decode("utf_32_be")

特别处理较小类型的常见情况也会有所帮助。顺便说一下，对于长字符串而言，这只是列表切片速度的3倍，而对于小字符串而言则慢4到5倍。

无论如何，我更喜欢使用join解决方案，但既然时间已经在其他地方提到了，那么我也可以加入进来。

- Veedrac

16

如果你想要最快的方法，你可以将itertools与operator.add结合使用：

In [36]: from operator import add

In [37]: from itertools import  starmap, izip

In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop

In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop

In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop

In [41]:  "".join(starmap(add, izip(l1,l2))) ==  "".join([i + j   for i, j in izip(l1, l2)]) ==  "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True

但是将izip和chain.from_iterable组合使用，速度更快。

In [2]: from itertools import  chain, izip

In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop

chain(*和chain.from_iterable(...之间也存在显著差异。

In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop

使用join的生成器是不存在的，将其作为参数传递总是会更慢，因为Python会首先使用一次扫描数据来确定所需的大小，然后再进行实际的连接操作。这在使用生成器时是不可能实现的：

join.h:

 /* Here is the general case.  Do a pre-pass to figure out the total
  * amount of space we'll need (sz), and see whether all arguments are
  * bytes-like.
   */

如果您有不同长度的字符串，并且不想丢失数据，可以使用izip_longest ：

In [22]: from itertools import izip_longest    
In [23]: a,b = "hlo","elworld"

In [24]:  "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'

对于Python 3，它被称为zip_longest

但是对于Python2，veedrac的建议是目前最快的：

In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
   ....: 
100 loops, best of 3: 2.68 ms per loop

- Padraic Cunningham

2

为什么不需要使用 list？ - Copperfield

1

不根据我的测试，你浪费时间制作中间列表，这违背了使用迭代器的目的。timeit "".join(list(...)) 给我 6.715280318699769，而 timeit "".join(starmap(...)) 给我 6.46332361384313。 - Copperfield

1

那么，机器相关吗？因为无论在哪里运行测试，我得到的确切结果都是一样的"".join(list(starmap(add, izip(l1,l2))))比"".join(starmap(add, izip(l1,l2)))慢。我在我的机器上使用Python 2.7.11和Python 3.5.1运行测试，甚至在www.python.org的虚拟控制台中使用Python 3.4.3，所有的结果都是一样的，我运行了几次，结果总是一样的。 - Copperfield

@Copperfield，你是在说列表调用还是传递列表？ - Padraic Cunningham

关于list(...)速度较慢的问题，手动调用list并不能提高速度。之所以推荐使用"".join([x for x in y])而不是"".join(x for x in y)，是因为后者创建了一个生成器，具有暂停-恢复开销。使用"".join(list(x for x in y))也无法改善速度问题。 - Veedrac

显示剩余5条评论

13

您还可以使用map和operator.add来实现此操作：

from operator import add

u = 'AAAAA'
l = 'aaaaa'

s = "".join(map(add, u, l))

输出:

'AaAaAaAaAa'

map函数会从第一个可迭代对象u中取出每个元素，从第二个可迭代对象l中取出相应元素，并且将这两个元素作为参数传入第一个参数add所指定的函数中进行计算，最后使用join方法将它们连接起来。

- root

8

Jim的回答很棒，但这是我最喜欢的选项，如果你不介意导入一些内容：

from functools import reduce
from operator import add

reduce(add, map(add, u, l))

- knite

7

他说的是最具有Python风格，而不是最具有Haskell风格。 - Curt

7

很多建议都假定字符串长度相等。也许这涵盖了所有合理的使用案例，但至少对我来说，您可能还想适应长度不同的字符串。还是只有我认为网格应该像这样工作：

u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"

以下是一种实现方法：

首先，您需要执行以下操作：

def mesh(a,b):
    minlen = min(len(a),len(b))
    return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])

- Christofer Ohlsson

5

我喜欢使用两个for循环，变量名可以提示/提醒正在发生的事情：

"".join(char for pair in zip(u,l) for char in pair)

- Neal Fultz

4

考虑到使用双重列表推导来处理n个字符串的时间复杂度为O(1), 不这样做感觉有点不符合Pythonic的风格：

"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)

其中all_strings是您想要交错的字符串列表。在您的情况下，all_strings = [u, l]。一个完整的使用示例如下：

import itertools
a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = 'abcdefghijklmnopqrstuvwxyz'
all_strings = [a,b]
interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)
print(interleaved)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

像很多答案一样，它可能不是最快的，但是简单且灵活。而且，不需要太多的复杂性，这比被接受的答案略微快一些（通常情况下，字符串相加在Python中有点慢）：

In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000;

In [8]: %timeit "".join(a + b for i, j in zip(l1, l2))
1 loops, best of 3: 227 ms per loop

In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs)
1 loops, best of 3: 198 ms per loop

- scnerd

尽管如此，它仍然没有最快答案快：在相同的数据和计算机上，最快的答案用了50.3毫秒。 - scnerd

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dimitris Fasarakis Hilliard · Accepted Answer

对于我来说，最具有Python特色的方法是以下方式，基本上做了相同的事情，但使用+运算符来连接每个字符串中的单个字符：

res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

这也比使用两个 join() 调用更快：

In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000

In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop

In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop

更快的方法存在，但它们常常使代码变得晦涩难懂。注意：如果两个输入字符串长度不同，则较长的一个将被截断，因为 zip在短字符串结束时停止迭代。在这种情况下，应该使用 zip_longest（在Python 2中为 izip_longest）从 itertools模块确保两个字符串都被完全处理。

_{*引用自《Python之禅》：可读性很重要。}
对我来说，Pythonic = 可读性；i + j至少在我看来更容易被视觉解析。

最Pythonic的方法来交错两个字符串

更快的替代方案

速度

字符串长度不同的变异

较短的一个决定长度（等同于 zip() 函数）

长迭代器决定长度（等同于itertools.zip_longest(fillvalue='')）

较短的一个决定长度（等同于 `zip()` 函数）

长迭代器决定长度（等同于`itertools.zip_longest(fillvalue='')`）