非常感谢这个有趣的问题!
我用纯Python从头实现了Dixon分解算法,提供了三种不同版本:
使用最简单的筛法。我创建了一个u64
数组,其中包含范围为[N; N * 2)的所有数字,这些数字表示z^2
值。该数组保存了质数乘积的结果。然后通过筛选过程,我迭代所有因子基质数,并在那些可被p
整除的k
位置上执行array[k] *= p
。最后,当筛选好的数组准备就绪时,我检查两个条件:a)数组索引k
是完全平方数,b)array[k] == k - N
。第二个条件b)
意味着所有乘以p
质数的结果给出最终数字,只有当数字仅被因子基质数整除时才成立,即它是B-平滑的。这是我三种解决方案中最简单和最慢的。
第二种解决方案使用SymPy库来分解每个z^2
。我迭代所有可能的z
并执行sympy.factorint(z * z)
,这会给出z^2
的分解。如果此分解仅包含小质数,即来自因子基的质数,则我会收集这样的z
和z^2
以供后续处理。这个算法版本也很慢,但比第一个要快得多。
第三种解决方案使用Quadratic Sieve中使用的一种筛法。这种筛选过程是三种算法中最快的。基本上它找到了方程x^2 = N (mod p)
在因子基中所有质数的根,由于我只有几个质数,所以通过简单的循环遍历所有变量来完成根查找,对于更大的质数可以使用Shanks Tonelli算法来查找根,这是非常快的。只有约50%的质数实际上给出了根解,因此只有一半的质数实际上用于二次筛法。这种方程的根可以用于一次生成许多解,因为对于所有k
,root + k * p
都是有效的解。筛选是通过array[offset(root) :: p] += Log2(p)
完成的。这里我使用添加质数的对数而不是乘法。首先,添加数字比乘法快一点。其次,更重要的是,它支持任何大小的数字,例如256位。而乘法只能进行64位数字,因为Numpy不支持128或256位整数。添加对数后,我检查哪些对数等于原始z^2
数字的对数,这些数字是最终筛选出来的数字。
在上述三种算法筛选所有z^2
之后,我通过高斯消元算法进行线性代数阶段。这个阶段的目的是找到这样一组B平滑的z^2
数字,它们的质因数相乘后可以得到最终数字,其中所有质数的幂次均为偶数。
我们将一个三元组z,z^2,z^2的质因数
称为关系。基本上,所有关系都会被提供给高斯消元阶段,从中找到偶数组合。
质数的偶数幂给我们等式a^2 = b^2 (mod N)
,从中我们可以通过执行factor = GCD(a + b, N)
来获得一个因子,其中GCD是通过欧几里得算法找到的最大公约数。这个GCD有时会给出平凡的因子1和N,在这种情况下,应该检查其他偶数组合。
为了确保得到偶数组合,我会进行筛选阶段,直到找到比素数数量多一些的关系数,实际上是素数数量的105%左右。这额外的5%关系确保我们在高斯阶段肯定会获得依赖的线性方程。所有这些依赖方程形成偶数组合。
实际上,我们需要更多的依赖方程,不仅仅是素数的1个以上,而是大约多5%-10%,因为其中一些(根据我的实验观察,大约50-60%)依赖只给出平凡因子1或N。因此需要额外的方程。
请查看我的帖子末尾的控制台输出。这个控制台输出显示了我的程序的所有印象。在那里,我同时运行第2个(Sieve_B)和第3个(Sieve_C)算法并行(多线程)。第一个(Sieve_A)由于速度太慢,等待完成需要很长时间,因此没有在我的程序中运行。
在源文件的最后,您可以调整变量bits = 64
到其他大小,例如bits = 96
。这是复合数N中的位数。该N是由两个相等大小的随机质数的乘积创建的。由两个相等大小的质数组成的这样的复合数通常称为RSA Number。
还要找到B = 1 << 10
,这告诉B平滑度的程度,基本上因子基由所有可能的质数< B
组成。您可以增加此B限制,这将使筛选出的z^2
的答案更频繁,因此整个分解过程变得更快。巨大的B的唯一限制是线性代数阶段(高斯消元),因为随着因子基越来越大,您必须解决更多更大的线性方程。我的高斯方法并不是非常优化,例如,您可以将位作为密集的np.uint64
而不是保持位作为np.uint8
,这将使线性代数速度提高8倍。
您还可能会找到变量M = 1 << 23
,它告诉我们筛法数组的大小有多大,换句话说,这是一次处理的块大小。更大的块速度略快,但差别不大。更大的M值不会带来太大差异,因为它只是告诉筛法进程被拆分成了什么大小的任务,它并不影响任何计算能力。此外,更大的M将占用更多的内存,所以您不能无限增加它,只能增加到您有足够的内存的程度。
除了上述所有算法,我还使用了Fermat Primality Test,以及Eratosthenes筛选法(用于生成素数因子基)。
此外,我还实现了自己的过滤平方数算法。为此,我选择了一些看起来接近Primorial的复合模数,例如mod = 2 * 2 * 2 * 3 * 3 * 5 * 7 * 11 * 13
。在布尔数组中,我标记所有模数mod
的数字,这些数字是平方数。稍后,当需要检查任何数字K是否为平方数时,我获取flag_array[K%mod]
,如果它为True,则该数字“可能”是平方数,而如果它为False,则该数字“绝对”不是平方数。因此,此过滤器有时会产生误报,但永远不会产生误判。此过滤器检查阶段可过滤掉95%的非平方数,剩下的5%“可能”是平方数,可以通过math.isqrt()进行双重检查。
请点击下面的
在线试用!
链接,在ReplIt的在线服务器上测试运行我的程序。这将给您最好的印象,特别是如果您没有Python或个人笔记本电脑。在安装了
python -m pip numpy sympy
之后,我的代码可以直接运行。
在线试用!
import threading
def GenPrimes_SieveOfEratosthenes(end):
import numpy as np
composites = np.zeros((end,), dtype = np.uint8)
for p in range(2, len(composites)):
if composites[p]:
continue
if p * p >= end:
break
composites[p * p :: p] = 1
primes = []
for p in range(2, len(composites)):
if not composites[p]:
primes.append(p)
return np.array(primes, dtype = np.uint32)
def Print(*pargs, __state = (threading.RLock(),), **nargs):
with __state[0]:
print(*pargs, flush = True, **nargs)
def IsSquare(n, *, state = []):
if len(state) == 0:
import numpy as np
Print('Pre-computing squares filter...')
squares_filter = 2 * 2 * 2 * 3 * 3 * 5 * 7 * 11 * 13
squares = np.zeros((squares_filter,), dtype = np.uint8)
squares[(np.arange(0, squares_filter, dtype = np.uint64) ** 2) % squares_filter] = 1
state.extend([squares_filter, squares])
if not state[1][n % state[0]]:
return False, None
import math
root = math.isqrt(n)
return root ** 2 == n, root
def FactorRef(x):
import sympy
return dict(sorted(sympy.factorint(x).items()))
def CheckZ(z, N, primes):
z2 = pow(z, 2, N)
factors = FactorRef(z2)
assert all(p <= primes[-1] for p in factors), (primes[-1], factors, N, z, z2)
return z
def SieveSimple(N, primes):
import time, math, numpy as np
Print('Simple Sieve of B-smooth z^2...')
sieve_block = 1 << 21
rep0_time = 0
for iiblock, iblock in enumerate(range(N, N * 2, sieve_block)):
if time.time() - rep0_time >= 30:
Print(f'Block {iiblock:>3} (2^{math.log2(max(iblock - N, 1)):>5.2f})')
rep0_time = time.time()
iblock_end = iblock + sieve_block
sieve_arr = np.ones((sieve_block,), dtype = np.uint64)
iblock_modN = iblock % N
for p in primes:
mp = 1
while True:
if mp * p >= sieve_block:
break
mp *= p
off = (mp - iblock_modN % mp) % mp
sieve_arr[off :: mp] *= p
for i in range(1 if iblock == N else 0, sieve_block):
num = iblock + i
z2 = num - N
if sieve_arr[i] < z2:
continue
assert sieve_arr[i] == z2, (sieve_arr[i], round(math.log2(sieve_arr[i]), 3), z2)
is_square, z = IsSquare(num)
if not is_square:
continue
yield CheckZ(z, N, primes)
def SieveFactor(N, primes):
import math
Print('Factor Sieve of B-smooth z^2...')
for iz, z in enumerate(range(math.isqrt(N - 1) + 1, math.isqrt(N * 2 - 1) + 1)):
z2 = z ** 2 - N
assert 0 <= z2 and z2 < N, (z, z2)
factors = FactorRef(z2)
if any(p > primes[-1] for p in factors):
continue
yield CheckZ(z, N, primes)
def BinarySearch(begin, end, Test):
while begin + 1 < end:
mid = (begin + end - 1) >> 1
if Test(mid):
end = mid + 1
else:
begin = mid + 1
assert begin + 1 == end and Test(begin), (begin, end, Test(begin))
return begin
def ModSqrt(n, p):
n %= p
def Ret(x):
if pow(x, 2, p) != n:
return []
nx = (p - x) % p
if x == nx:
return [x]
elif x <= nx:
return [x, nx]
else:
return [nx, x]
for i in range(p):
if pow(i, 2, p) == n:
return Ret(i)
return []
def SieveQuadratic(N, primes):
import math, numpy as np
M = 1 << 23
def Log2I(x):
return int(round(math.log2(max(1, x)) * (1 << 24)))
def Log2IF(li):
return li / (1 << 24)
Print('Quadratic Sieve of B-smooth z^2...')
plogs = {}
for p in primes:
plogs[int(p)] = Log2I(int(p))
qprimes = []
B = int(primes[-1]) + 1
for p in primes:
p = int(p)
res = []
mp = 1
while True:
if mp * p >= B:
break
mp *= p
roots = ModSqrt(N, mp)
if len(roots) == 0:
if mp == p:
break
continue
res.append((mp, tuple(roots)))
if len(res) > 0:
qprimes.append(res)
qprimes_lin = np.array([pinfo[0][0] for pinfo in qprimes], dtype = np.uint32)
yield qprimes_lin
Print('QSieve num primes', len(qprimes), f'({len(qprimes) * 100 / len(primes):.1f}%)')
x_begin0 = math.isqrt(N - 1) + 1
assert N <= x_begin0 ** 2
for iblock in range(1 << 30):
if (x_begin0 + (iblock + 1) * M) ** 2 - N >= N:
break
x_begin = x_begin0 + iblock * M
if iblock != 0:
Print('\n', end = '')
Print(f'Block {iblock} (2^{math.log2(max(1, x_begin ** 2 - N)):>6.2f})...')
a = np.zeros((M,), np.uint32)
for pinfo in qprimes:
p = pinfo[0][0]
plog = np.uint32(plogs[p])
for imp, (mp, roots) in enumerate(pinfo):
off_done = set()
for root in roots:
for off in range(mp):
if ((x_begin + off) ** 2 - N) % mp == 0 and off not in off_done:
break
else:
continue
a[off :: mp] += plog
off_done.add(off)
logs = np.log2(np.array((np.arange(M).astype(np.float64) + x_begin) ** 2 - N, dtype = np.float64))
logs2if = Log2IF(a.astype(np.float64))
logs_diff = np.abs(logs - logs2if)
for ix in range(M):
if logs_diff[ix] > 0.3:
continue
z = x_begin + ix
z2 = z * z - N
factors = FactorRef(z2)
assert all(p <= primes[-1] for p, c in factors.items())
yield CheckZ(z, N, primes)
def LinAlg(N, zs, primes):
import numpy as np
Print('Linear algebra...')
Print('Factoring...')
m = np.zeros((len(zs), len(primes) + len(zs)), dtype = np.uint8)
def SwapRows(i, j):
t = np.copy(m[i])
m[i][...] = m[j][...]
m[j][...] = t[...]
def MatToStr(m):
s = '\n'
for i in range(len(m)):
for j in range(len(m[i])):
s += str(m[i, j])
s += '\n'
return s[1:-1]
for iz, z in enumerate(zs):
z2 = z * z - N
fs = FactorRef(z2)
for p, c in fs.items():
i = np.searchsorted(primes, p, 'right') - 1
assert i >= 0 and i < len(primes) and primes[i] == p, (i, primes[i])
m[iz, i] = (int(m[iz, i]) + c) % 2
m[iz, len(primes) + iz] = 1
Print('Gaussian elemination...')
one_col, one_rows = 0, 0
while True:
while True:
for i in range(one_rows, len(m)):
if m[i, one_col]:
break
else:
one_col += 1
if one_col >= len(primes):
break
continue
break
if one_col >= len(primes):
break
assert m[i, one_col]
assert np.all(m[i, :one_col] == 0)
for j in range(len(m)):
if i == j:
continue
if not m[j, one_col]:
continue
m[j][...] ^= m[i][...]
SwapRows(one_rows, i)
one_rows += 1
one_col += 1
assert np.all(m[one_rows:, :len(primes)] == 0)
zeros = m[one_rows:, len(primes):]
Print(f'Even combinations ({len(m) - one_rows}):')
Print(MatToStr(zeros))
return zeros
def ProcessResults(N, zs, la_zeros):
import math
Print('Computing final results...')
factors = []
for i in range(len(la_zeros)):
zero = la_zeros[i]
assert len(zero) == len(zs)
cz = []
for j in range(len(zero)):
if not zero[j]:
continue
z = zs[j]
z2 = z * z - N
cz.append((z, z2, FactorRef(z2)))
a = 1
for z, z2, fs in cz:
a = (a * z) % N
cnts = {}
for z, z2, fs in cz:
for p, c in fs.items():
cnts[p] = cnts.get(p, 0) + c
cnts = dict(sorted(cnts.items()))
b = 1
for p, c in cnts.items():
assert c % 2 == 0, (p, c, cnts)
b = (b * pow(p, c // 2, N)) % N
factor = math.gcd(a + b, N)
Print('a', str(a).rjust(len(str(N))), ' b', str(b).rjust(len(str(N))), ' factor', factor if factor != N else 'N')
if factor != 1 and factor != N:
factors.append(factor)
return factors
def SieveCollectResults(N, its):
import time, threading, queue, traceback, math
K = len(its)
qs = [queue.Queue() for i in range(K)]
last_dot, finish = False, False
def Get(it, ty, need, compul):
nonlocal last_dot, finish
try:
cnt = 0
for iz, z in enumerate(it):
if finish:
break
if iz < 4:
z2 = z * z - N
Print(('\n' if last_dot else '') + 'Sieve_' + ('C', 'B', 'A')[K - 1 - ty], ' iz', iz,
'z', z, 'z^2', z2, f'(2^{math.log2(max(1, z2)):>6.2f})', ', z^2 factors', FactorRef(z2))
last_dot = False
else:
Print(('.', 'b', 'a')[K - 1 - ty], end = '')
last_dot = True
qs[ty].put(z)
cnt += 1
if cnt >= need:
break
except:
Print(traceback.format_exc())
thr = []
for ty, (it, need, compul) in enumerate(its):
thr.append(threading.Thread(target = Get, args = (it, ty, need, compul), daemon = True))
thr[-1].start()
for ithr, t in enumerate(thr):
if its[ithr][2]:
t.join()
finish = True
if last_dot:
Print()
zs = [[] for i in range(K)]
for iq, q in enumerate(qs):
while not qs[iq].empty():
zs[iq].append(qs[iq].get())
return zs
def DixonFactor(N):
import time, math, numpy as np, sys
B = 1 << 10
primes = GenPrimes_SieveOfEratosthenes(B)
Print('Num primes', len(primes), 'last prime', primes[-1])
IsSquare(0)
it = SieveQuadratic(N, primes)
qprimes = next(it)
zs = SieveCollectResults(N, [
(SieveFactor(N, primes), 3, False),
(it, round(len(qprimes) * 1.06 + 0.5), True),
])[-1]
la_zeros = LinAlg(N, zs, qprimes)
fs = ProcessResults(N, zs, la_zeros)
if len(fs) > 0:
Print('Factored, factors', sorted(set(fs)))
else:
Print('Failed to factor! Try running program again...')
def IsPrime_Fermat(n, *, ntrials = 32):
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(ntrials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def GenRandom(bits):
import random
return random.randrange(1 << (bits - 1), 1 << bits)
def RandPrime(bits):
while True:
n = GenRandom(bits) | 1
if IsPrime_Fermat(n):
return n
def Main():
import math
bits = 64
N = RandPrime(bits // 2) * RandPrime((bits + 1) // 2)
Print('N to factor', N, f'(2^{math.log2(N):>5.1f})')
DixonFactor(N)
if __name__ == '__main__':
Main()
控制台输出:
N to factor 10086068308526249063 (2^ 63.1)
Num primes 172 last prime 1021
Pre-computing squares filter...
Quadratic Sieve of B-smooth z^2...
Factor Sieve of B-smooth z^2...
QSieve num primes 78 (45.3%)
Block 0 (2^ 32.14)...
Sieve_C iz 0 z 3175858067 z^2 6153202727426 (2^ 42.48) , z^2 factors {2: 1, 29: 2, 67: 1, 191: 1, 487: 1, 587: 1}
Sieve_C iz 1 z 3175859246 z^2 13641877439453 (2^ 43.63) , z^2 factors {31: 1, 61: 1, 167: 1, 179: 1, 373: 1, 647: 1}
Sieve_C iz 2 z 3175863276 z^2 39239319203113 (2^ 45.16) , z^2 factors {31: 1, 109: 1, 163: 1, 277: 1, 311: 1, 827: 1}
Sieve_C iz 3 z 3175867115 z^2 63623612174162 (2^ 45.85) , z^2 factors {2: 1, 29: 1, 41: 1, 47: 1, 61: 1, 127: 1, 197: 1, 373: 1}
.........................................................................
Sieve_B iz 0 z 3175858067 z^2 6153202727426 (2^ 42.48) , z^2 factors {2: 1, 29: 2, 67: 1, 191: 1, 487: 1, 587: 1}
......
Linear algebra...
Factoring...
Gaussian elemination...
Even combinations (7):
01000000000000000000000000000000000000000000000000001100000000000000000000000000000
11010100000010000100100000010011100000000001001001001001011001000000110001010000000
11001011000101111100011111001011010011000111101000001001011000001111100101001110000
11010010010000110110101100110101000100001100010011100011101000100010011011001001000
00010110111010000010000010000111010001010010111001000011011011101110110001001100100
00000010111000110010100110001111010101001000011010110011101000110001101101100100010
10010001111111101100011110111110110100000110111011010001010001100000010100000100001
Computing final results...
a 9990591196683978238 b 9990591196683978238 factor 1
a 936902490212600845 b 3051457985176300292 factor 3960321451
a 1072293684177681642 b 8576178744296269655 factor 2546780213
a 1578121372922149955 b 1578121372922149955 factor 1
a 2036768191033218175 b 8049300117493030888 factor N
a 1489997751586754228 b 2231890938565281666 factor 3960321451
a 9673227070299809069 b 3412883990935144956 factor 3960321451
Factored, factors [2546780213, 3960321451]