摘要
您好,假设您有两个不同的独立的64位二进制矩阵A
和T
(T
是另一个以转置形式存储的矩阵,在乘法过程中使用矩阵的转置版本可以对T
的行而不是列进行操作,这对于二进制算术非常方便),您想要将这些矩阵相乘。唯一需要注意的是矩阵乘积结果被截断为64位,如果在某些特定的矩阵单元格中得到大于1
的值,则结果矩阵单元格将包含1
,否则为0
。
示例
A T
00000001 01111101
01010100 01100101
10010111 00010100
10110000 00011000 <-- This matrix is transposed
11000100 00111110
10000011 10101111
11110101 11000100
10100000 01100010
二进制和传统乘法的结果:
Binary Traditional
11000100 11000100
11111111 32212121
11111111 32213421
11111111 21112211
11101111 22101231
11001111 11001311
11111111 54213432
11001111 11001211
问题
如何以最有效的方式按上述描述相乘这些矩阵?
P.S
我尝试利用二进制 and
(即 &
运算符)而不是对单独位进行乘法运算,在这种情况下,我必须为乘法准备数据:
ulong u;
u = T & 0xFF;
u = (u << 00) + (u << 08) + (u << 16) + (u << 24)
+ (u << 32) + (u << 40) + (u << 48) + (u << 56);
现在,通过对两个整数A
和u
执行二进制and
操作,将产生以下结果:
A u R C
00000001 01111101 00000001 1
01010100 01111101 01010100 3
10010111 01111101 00010101 3
10110000 01111101 00110000 2
11000100 01111101 01000100 2
10000011 01111101 00000001 1
11110101 01111101 01110101 5
10100000 01111101 00100000 1
在上面的示例中,
R
包含 A
位乘以 u
位的结果,为了得到最终值,我们必须对一行中的所有位进行求和。请注意,列 C
包含与上述结果为 Traditional
矩阵乘法的第一列中找到的值相等的值。问题在于,在此步骤中,我必须操作单独的位,我认为这是次优的方法,我已经通过http://graphics.stanford.edu/~seander/bithacks.html 查找了一种并行执行的方法,但没有成功,如果有任何人有关于如何将位于 R
列中的值“展平”和“合并”到结果 64 位矩阵中的任何想法,我将不胜感激,期待你的回复。感谢您,
编辑
非常感谢David Eisenstat,最终算法如下:var A = ...;
var T = ...; // T == transpose(t), t is original matrix, algorithm works with transposed matrix
var D = 0x8040201008040201UL;
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D);
以下代码段:
public static void Main (string[] args){
ulong U;
var Random = new Xor128 ();
var timer = DateTime.Now;
var A = Random.As<IUniformRandom<UInt64>>().Evaluate();
var T = Random.As<IUniformRandom<UInt64>>().Evaluate();
var steps = 10000000;
for (var i = 0; i < steps; i++) {
ulong r = 0;
var d = 0x8040201008040201UL;
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d);
}
Console.WriteLine (DateTime.Now - timer);
var m1 = new Int32[8,8];
var m2 = new Int32[8,8];
var m3 = new Int32[8,8];
for (int row = 0; row < 8; row++) {
for (int col = 0; col < 8; col++) {
m1 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
m2 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
m3 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
}
}
timer = DateTime.Now;
for (int i = 0; i < steps; i++) {
for (int row = 0; row < 8; row++) {
for (int col = 0; col < 8; col++) {
var sum = 0;
for (int temp = 0; temp < 8; temp++) {
sum += m1 [row, temp] * m2 [temp, row];
}
m3 [row, col] = sum;
}
}
}
Console.WriteLine (DateTime.Now - timer);
}
展示以下结果给我:
00:00:02.4035870
00:00:57.5147150
感谢大家,这意味着在Mac OS X/Mono下的性能提升了23倍。