我知道在Pandas Series中,逻辑AND用符号&表示,逻辑OR用符号|表示,但我正在寻找逐元素的逻辑XOR。我可以使用AND和OR来表达它,但如果有XOR可用,我更愿意使用它。
谢谢!
谢谢!
Python异或运算: a ^ b
Numpy逻辑异或: np.logical_xor(a,b)
性能测试 - 结果相等:
1. 大小为10000的随机布尔序列
In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)
In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop
In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop
2. 大小为1000的随机布尔序列
In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)
In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop
In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop
3. 长度为100的随机布尔序列
In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)
In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop
In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop
4. 长度为10的随机布尔值序列
In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)
In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop
In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop
a^b
和np.logical_xor(a,b)
并不等价,这让我很困惑,但最后只是一个简单的修复。希望这能帮助其他人避免头疼。a
是一个具有Index
上重复值的DataFrame
,b
是一个bool
类型的Series
,其中b.index == a.columns
。b
广播到a
,并对每一行的a
和b
进行逐元素异或操作,其中a.index
上的任何重复值都应该传递到输出结果中。np.logical_xor(a,b.to_frame().T)
...但是在我的新设置上失败了:
TypeError: '<' not supported between instances of 'Timestamp' and 'int'
b
(b.index
是一个无意义的[0]
)连接到具有时间戳索引的a
上,我相信需要对其进行排序以使其单调。a^b