Pandas中的逐元素异或

Question

Pandas中的逐元素异或

pythonpandaslogicxor

17

我知道在Pandas Series中，逻辑AND用符号&表示，逻辑OR用符号|表示，但我正在寻找逐元素的逻辑XOR。我可以使用AND和OR来表达它，但如果有XOR可用，我更愿意使用它。

谢谢！

- SulfoCyaNate

2个回答

0

我发现了一个问题，a^b和np.logical_xor(a,b)并不等价，这让我很困惑，但最后只是一个简单的修复。希望这能帮助其他人避免头疼。

我最近从Pandas 0.25.3升级到2.0.3（numpy从1.19.0升级到1.24.4），这引发了这个问题。

假设a是一个具有Index上重复值的DataFrame，b是一个bool类型的Series，其中b.index == a.columns。

我的意图是将b广播到a，并对每一行的a和b进行逐元素异或操作，其中a.index上的任何重复值都应该传递到输出结果中。

这段代码在我的旧设置上运行正常...

np.logical_xor(a,b.to_frame().T)

...但是在我的新设置上失败了：

TypeError: '<' not supported between instances of 'Timestamp' and 'int'

我相信是因为广播中的某些内容试图将b（b.index是一个无意义的[0]）连接到具有时间戳索引的a上，我相信需要对其进行排序以使其单调。

解决方案是，正如这个问题的提出者让我考虑的那样：

a^b

这个令人恼火/美妙的事情是，这似乎也适用于我的旧版pandas/numpy "生产"设置。巧合的是，这是我第一次使用"git blame"。答案是："初始提交"3年前，所以要么在更早版本的Pandas中a^b不起作用，要么是我不知道它的存在。

- Adam Fuller

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

Python异或运算: a ^ b

Numpy逻辑异或: np.logical_xor(a,b)

性能测试 - 结果相等:

1. 大小为10000的随机布尔序列

In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)

In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

2. 大小为1000的随机布尔序列

In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)

In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

3. 长度为100的随机布尔序列

In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)

In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop

In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop

4. 长度为10的随机布尔值序列

In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)

In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop

In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop