基于ID列表高效计算XOR(^)校验和的方法

Question

基于ID列表高效计算XOR(^)校验和的方法

4

当我在谷歌上搜索关于Python列表推导的信息时，我被提供了一个谷歌foobar挑战，为了好玩我已经慢慢地在过去的几天里进行了挑战。最新的挑战：

这段代码要求有效地生成一个ID列表，忽略每个新行中逐渐增加的ID，直到只剩下一个ID。然后，您应该使用XOR（^）对这些ID进行运算以生成校验和。我创建了一个输出正确答案的可行程序，但它的效率不足以通过所有测试用例（6/10），在规定的时间内完成50000长度的计算，需要不到20秒，但是它需要320秒。

有没有人可以指导我走向正确的方向，但请不要替代我完成代码，我正在通过这个挑战来锻炼自己。也许我可以实现一种数据结构或算法来加快计算时间？

代码背后的逻辑：

首先，输入起始ID和长度
生成一个ID列表，忽略每个新行开始忽略的ID数量，从第一行开始忽略0个ID。
使用for循环对ID列表中的所有数字进行XOR运算
将答案返回为int型

import timeit
def answer(start,length):
    x = start
    lengthmodified = length
    answerlist = []
    for i in range (0,lengthmodified): #Outter for loop runs an amount of times equal to the variable "length".
        prestringresult = 0
        templist = []
        for y in range (x,x + length): #Fills list with ids for new line
            templist.append(y)
        for d in range (0,lengthmodified): #Ignores an id from each line, increasing by one with each line, and starting with 0 for the first
            answerlist.append(templist[d])
        lengthmodified -= 1
        x += length    
        for n in answerlist: #XORs all of the numbers in the list via a loop and saves to prestringresult
            prestringresult ^= n
        stringresult = str(prestringresult) 
        answerlist = [] #Emptys list
        answerlist.append(int(stringresult)) #Adds the result of XORing all of the numbers in the list to the answer list
    #print(answerlist[0]) #Print statement allows value that's being returned to be checked, just uncomment it
    return (answerlist[0]) #Returns Answer



#start = timeit.default_timer()
answer(17,4)
#stop = timeit.default_timer()
#print (stop - start)

- Mrcitrusboots

你有两个内部循环。尝试消除它们。 - Michael

4个回答

3

在这个问题中，大多数人都会遇到时间限制。我也是！这个问题可以总结为：“在恒定时间内找到特定范围内所有数字的异或值。”是的，恒定时间！

因此，在3-6之间，答案应该是3^4^5^6 = 4，时间复杂度为O(1)。

解决方案：异或运算具有结合性质。因此，A ^ B ^ C可以写成B ^ A ^ C。此外，我们知道XOR意味着：“相同位的'AND'结果为True即1，不同位的结果为2。”

基于这两个特性，我们可以写出：从3到6的所有数字之间的XOR可以写成：

3^4^5^6 = (0^1^2)^(0^1^2) ^ (3^4^5^6)
        = (0^1^2^3^4^5^6) ^ (0^1^2) (this comes from the associative nature of xor)
        = XOR betn all the numbers from (0-6) ^ XOR betn all the numbers from (0-2)...eq(1)

现在，如果我们能在常数时间内找到从0到某个整数的所有数字的XOR，我们就可以得到我们的答案。

幸运的是，存在一个模式：

以此为例：

(0-1): 0 ^ 1 = 1 (1)
(0-2): 0 ^ 1 ^ 2 = 3 (2+1)
(0-3): 0 ^ 1 ^ 2 ^ 3 = 0 (0)
(0-4): 0 ^ 1 ^ 2 ^ 3 ^ 4 = 4 (4)

(0-5): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 = 1 (1)
(0-6): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 = 7 (6+1)
(0-7): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 ^  7 = 0 (0)
(0-8): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 ^ 7 ^ 8 = 8 (8)


So the pattern for finding the xor between all the integers between 0 to n is:
if n%4 == 1 then, answer = 1
if n%4 == 2 then, answer = n+1
if n%4 == 3 then, answer = 0
if n%4 == 0 then answer = n 

Therefore, XOR(0-6) becomes 7 (since 6%4 ==2) and XOR(0-2) becomes 3 (since 2%4 ==2)

Therefore, the eq(1) now becomes:
3^4^5^6 = 7 ^ 3 = 4

现在问题很简单，我们中的大多数人会因为超时错误而被卡住，因为我们尝试在每个循环中进行异或运算，如果输入/迭代次数增加，这将是巨大的。

以下是我在Python中的工作解决方案，所有谷歌的测试用例都通过了：

#Main Program
def answer(start, length):
    checkSum = 0
    for l in range(length, 0, -1):
        checkSum = checkSum ^ (getXor(start + l-1) ^ getXor(start-1))
        start = start + length
    return checkSum

def getXor(x):
    result = [x, 1, x+1, 0]
    return result[x % 4]

- Suresh Lamichhane

2

没有使用列表，我还是有些改进，但是对于大数字仍然会失败。嵌套循环会降低速度。我认为你需要遵循Pochmann的逻辑，因为在这些类型的问题中，暴力破解通常不是正确的方法。

- Michael

1

无需使用templist或answerlist。让我们来看看您的代码，以了解如何消除它们。

First, let's make templist's initialization a one-liner. This:

templist = []
for y in range (x,x + length):
    templist.append(y)

Becomes this:

templist = list(range(x, x + length))

Then let's do the same for answerlist. This:

for d in range (0,lengthmodified):
    answerlist.append(templist[d])

Becomes this:

answerlist.extend(templist[:lengthmodified])

Now let's take a look at how they're used later. If we ignore lengthmodified -= 1 and x += length for now, we have:
```
templist = list(range(x, x + length))
answerlist.extend(templist[:lengthmodified])

for n in answerlist:
    prestringresult ^= n

answerlist = []
```
Instead of extending answerlist, iterating over it, and then clearing it, it'd be faster to just iterate over templist.
```
templist = list(range(x, x + length))

for n in templist[:lengthmodified]:
    prestringresult ^= n
```
And now there's no need for templist either, so let's skip building it as well.
```
for n in range(x, x + lengthmodified):
    prestringresult ^= n
```
templist and answerlist are gone.

这里唯一缺少的就是将 answerlist.append(int(stringresult)) 写回去的工作。我会留给你自己解决。

总的来说，教训是尽可能避免使用显式的 for 循环。写很多迭代容器的 for 循环是 C 语言的思维方式。在 Python 中，通常有一些方法可以一次性处理集合。这样做可以利用语言内置的高速操作。

此外，Python 的惯用语法也更容易阅读。

- John Kugelman

这会让它快多少？ - Stefan Pochmann

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Stefan Pochmann · Accepted Answer

您可能需要采用不同的方法，而不仅是像John那样进行微小的改进。我刚刚编写了一种解决方案，在我的电脑上可以在约2秒钟内执行answer（0, 50000）。我仍然是逐行完成，但是不是对范围内的所有数字进行异或，而是逐位进行。该行中有多少个具有1位设置的数字？奇数个数字？那么我将翻转答案的第1位。然后，对于第2位、第4位、第8位等，都是相同的，直到第2 ³⁰-位。因此，对于每一行，只需进行31次小计算（而不是实际异或成千上万个数字）。

[*] 可以从范围的开始/结束快速计算出常量时间。

编辑：由于您要求另一个提示，这里是如何计算某个范围（a，b）中设置1位的频率。计算它在范围（0，a）中设置的频率，然后从范围（0，b）中设置的频率中减去它。如果范围从零开始，则更容易。一些范围（0，c）中设置1位的频率是多少？简单： c // 2 次。那么在一定范围内（a，b）中设置1位的频率是多少？仅仅是 b // 2-a // 2 次。更高的位类似，只是有点复杂。

编辑2：哦，等等，我刚想起来...有一种简单的方法可以计算某个范围（a，b）中所有数字的异或值。再次将工作分为执行范围（0，a）和范围（0，b）。在某个范围（0，c）中，所有数字的异或值都很容易，因为有一个不错的模式（如果您将其应用于所有从0到30的c，则会看到它）。使用此功能，现在我大约可以在 0.04秒内解决answer（0, 50000）。